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DECLARATION OF PIA M. CHALLITA-EID 
K^NC „^SSrrSS^^» BLOT DATA 



Assistant Commissioner for Patents 
Washington, D.C. 20231 

Dear Sir: . 

I, Pia M. Challita-Eid, declare as follows: 

, , have a PhD. in Microbiology from University of Southern California, did post doctora, 
work a. University of California at lx>s Ange.es, and was a faculty member at the 
University of Rochester. I have been practicing in the field of molecular biology for over 
iOyears. At Agensys, I am the Group Leader of Gene Discovery. !n my posttton a, 
Agensys, I have responsibility for evaluating the levels of expression of various genes m 
tissues. A copy of my curriculum vilae is enclosed as Extob.t A. 

2 Our company, Agensys, is dedicated ,„ discovery of proteins that are highly exposed in 
varioustumortissuesascompared.onormal.issues.Thecompanyapproachesnus • 

discovery task by firs, identifying cDNAs which correspond to genes overexposed m 
tumor tissue using the technique of suppression sub.rac.ive hybridization (SSH). In th,s 
technique, cDNA from normal tissues is sub.rac.ed from cDNA from tumor .issues. 
Thereby cDNA present in tumor tissues, but not in normal tissues is isolated. Thus, on a 
gene-by-gene basis, this approach can indicate that a gene corresponding to the cDNA .s 
overexpressed in tumor cells. 
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TypicaUy, .he next step is to u.i.ize ,h= se,uer.ce taxation obtained from SSH ,o 
obTain a fi„.-.eng.h DNA Cone which induces the entire open readtng frame for .he 
protein corresponding to this cDNA. 

b addition, .he leve. of egression of the correspond gene is determined in various 

production of messenger RNA, ft- encodes the preens a necessary step «,*. 
LduCon of the protein Uself. Therefore, detection <m^*!SZK2" 

Northern b>o,.in g is a detection method of re,a,ive levels ofmRNA expression of a gene. 
It „ procedure in which specific mRNA is measured using a nucleic acid 
.echnioue. The signal is de,ec.ed on an audiogram- The stronger .he signa,, ft. more 
abundant is the mKNA. For genes .hat producemRr^c^n^P^adtng 
frame flanked by a good K** *S£SZZ^»J? • «*»•. ta to "T* 

siteTa^e discussed in greater detail paragraph 7, below. 

/ The evidence referred .o in paraphs 3, 4 and 5 above is consistent wi<h the genera. 
fcr.ow.edge in the art ofn»l«c^.Wo.^.y^5..««n>ft» «SXS2» 
mRNA is prenisfeof expression of its encoded protein. This is particular* true for 
mRNAwMhan open read^a^aK^ concensus = ^<>»^ n 

initiation. 

The Kozak consensus sequence for trar.la.ion initiation CCACdfflP. where .he ATG 
star, codon is „o.ed, is .he sequence with the highest established probability of ini.ia. ng 
translation. This was confirmed by Peri and Pandey Trends in Genetics (2001) 12: 685- 
687 (Exhibit B) Peri describes a study of over 1 500 translation initiation sites « order to 
address acuta. mRNA translation initiation. This study showed that me mos. au.hen.tc 



U.S.S.N. 09/697,206 



PAGE. 03 



initiation sequence has three or less mismatches .on, the op— co se ^ 
sequence CCACC4S23. The sequence of the transition inmatton s.te of 20P2H08. 
GCCACA4TOA, shows only 2 nucleic acid differences from .he optimum Kozak 
consensus. Altogether, these data demonstrate that the transition initiation s,t= o 
20P2H08 is functiona. and can initiate protein translation, tt is worth noting that Pen and 
Pandey's interpretation of their data is mis.eadingin thecontext of the present issue 
nameiy^epredicttvcvaiueofa.eKo^consensussequence. Nevertheless Pen s ata 
conftrm the predictive vatae of the Kozak consensus. The data in Peri's Table 1 make 
dear that the single best prediction of an authentic start site is an exact Kozak sequence 
which encompasses the start methionine. The predictive value of the Kozak consensus 
sequence is further va.ida.ed by the data in Figure 1 of Peri. Figure .(a) sets forth the 
percentage of verified protein-encoding transcripts compared to conformance to the 
Kozak consensus; the statistical probability of the Kozak mismatches is set forth m the 
Figure legend. By comparing the percent occurrence of the protein-encoding transits 
to the percent occurence based on random probability, it is shown that encod.ng 
transcripts with an exact Kozak occur 7000% more frequently than by random 
probability. The relative percentages for mismatches in the encoding transcripts relative 
,„ a random occurrence for that number of mistaches is: 1 mismatch 1275%, 2 
mismatches 533%, 3 mismatches 205%, 4 mismatches 94%, 5 mismatches 45%, 6 
mismatches 28%. Accordingly, by the time there are more than three Kozak mismatches, 
the existence of such an encoding transcript is more likely than not to have occurred 
merely by random chance. 

8 The Northern blot technique is used as a routine procedure (as compared to Western 

blotting, immunoblotting or immunohistochemistry) because it does not require the tune 
delays involved in isolating or synthesizing the protein, preparing an immunological 
composition of the protein, eliciting a humoral immune response, harvesting the 
antibodies, and verifying the specificity thereof. All of these things can be done, but they 
take time, and the presence of mRNA on Northern blots, especially in comparative 
tissues, is a recognized indication that the protein itself will be produced. 
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, am familiar with the genera! practice of Northern Wotting and interpretation, descnbed 
above, being carried out, not only a. Agensys, but a.so a, other companies that seek to 
eva.uate gene expression in various tumor and other .issues: The use of Northern blots as 
a means for evaluating protein production is universal.y accepted as reliable and ,s 

therefore widely practiced. 

It is understood that the absolute levels of messenger RNA present and the amounts of 
protein produced do not always provide a 1:1 correlation. However, in those mstances 
where the Northern blot has shown mRNA to be present, it is almo^always possible, in 
my experience, when the time is takento jo_sp, to detect the presence of the 
co'r^mg protein in the tis7ue"which provided a positive result in the Northern blot. 
Thl'ieSs'of the'protein compared to the levels of the mRNA may be disjunctive, but u 
inaccurate to say that there is no correlation between protein levels and mRNA levels as a 
general matter. In general, cells that exhibit detectable mRNA also exhibit detectable 
corresponding protein and vice versa. This is particularly true where the mRNA has an 
open reading frame and a good Kozak sequence. 

Ironically, studies seeking to determine the overall pattern of correlation between mRNA 
and corresponding protein have generally started with displaying the protein fingerpnnt of 
f a particular cell or tissue. For instance, an article by Anderson^ and Seilhamer, L, 
Electrophoresis (1997) 18:533-537. (Exhibit C) describes such a study on a patient liver. 
A 2D gel wa^btaincd to determine the pattern of proteins in the liver, and a cDNA 
library was used to determine the pattern for mRNA. The authors found that of 23 
selected proteins which could be identified from the gel, mRNA for 19 were detected in 
the transcript images. Thus, in the vast majority of cases, there was both mRNA and 
protein present. The authors found that the lgygls_o f RNA units to protein _units had a 
correlation coefficient of 0.48. As they state.Jhis number is intriguingly close to the 
middle positionbetween a perfect. correlation^ no correlation whatever (0.0). 
Only a correlation coefficient of (a0) would support a proposition that mRNA presence 
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• • r ot^innr^ence The conclusion has to be that in the vast 
provides no indication of protein presence. r , mR M A is 

ma j ori,yofin S .ances,,e..a„ycor« 
mRNA expression and protein expression. 

An article by Oh, ,M.C, - a, f*— 000.) l:.303-»» 0*™ ^ ^ 
database of protein expression ,n ,„n g cancer. Again, the study sough, to determme the 
correlation between mRNA and corresponding protein beginning wvth protetn fing rpnnt 

nrRNA expression was evaluated using microarrays. The approach is suggested as a 
genera, too! for evaiuating the correction between mRNA expression and protem 
expression. Ciear.y the fitndamenta. premise of the approach is that the correlate w„. 
no, be zero, or the too. would no, even be proposed, le, alone published m a peer- 

reviewed journal. 

.amawareofapubUcationbyFu,!,,^/.. E mb o.Journal^ 15*392 - 4401 
(Exhibit E) which reports an extremely rare occurrence where, in a certain percentage of 
leukemia patient blood samples there was zero production ofdetecabie" protem m the 
presence of corresponding mRNA. These data are for to specific protein P 5 3. It, 
important to note that the P 53 protein is know, to be extremely unstable (see, e.g.. 
Molecular Cel. Biology 4ft Ed., Lodish e. al. eds. p. 531 (W.H. Freeman, New York 
2000) (Exhibit F). This fact is noted by Fu et a!., a, page 4393 where they acknowledge ■ 
to, p53 with a half-.ife less than ,5 minutes may "remain undetectable by this assay^ to 
Figure 3 of Fu * «l the data indicate tot there was detectable protein production o 44/. 
of the samples. I am not familiar with any instance where no protein is scienffically 
documented in the presence of corresponding mRNA. The situation addressed by Fu * 
„, may be an exception to the rule that mere is a positive correlation between mRNA 
presence and protein production, provided to protein truly failed * exist and not tot the 
protein was simply undetectable in their assay format 
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As addressed in paragraph 13. in many oases, a reported lack of protein express.on ,s due 
to technical limitations of the protein detection assay. For instance, the avatlable 
antibody may only detect denatured protein but not native protein present m a cell, to 
other instances, the half-life of the protein is very short; thereby the steady-state protem 
levels are below detectable range. Short-lived proteins are still functional, and some have 
been previously described to induce tumor formation as shown in the article by Remstem, 
e, ai, Oncogene 19: 5944-50 (Exhibit G). In such situations, when more sensmve 
detection techniques are performed and/or other antibodies are generated, protem 
expression is detected. When studies fail to take these principles into account, they are 
likely to report artifactually lowered correlations of mRNA to protein. 

15 Most genes, when they produce mRNA tot contains an open-reading frame flanked by a 
good Kozak translation initiation site and a stop codon, the synthesized mRNA code for a 
protein Analysis of 22P4F11 shows, a strong mRNA signal on Northern blot in cancer 
tissues, and the mRNA sequence shows an open-reading frame containing a good Kozak 
' initiation site and a stop codon. Therefore, production of 22P4F1 1 protein is reasonably 
predicted based on this data. 



16. 



In summary, the scientific community regards the presence of mRNA in cells is indicate 
of the production of protein. This is particularly true when the Northern data are strong 
and the mRNA has an open reading frame and a good Kozak sequence. It is understood 
that the correlation of mRNA and protein levels is not perfect, however, an instance 
where protein is absent although mRNA is present is at best a rare exception, and may 
merely be an artifact of the assay system used. 

17. The use of positive Northern blots as both predictive and actually indicative of protein 
production is a recognized conclusion of scientists in this field. 
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, dec.are ,ha« all statements made herein of my own knowledge are tnre and that all 
cements made on informal and be.ief are believed to be true; and fcrther, that these 
statements are made with the knowiedge that wii.ful, false statements and the Ite so 
m ade are punishabie by fine or imprisonment or both, under Section .001 of Title 18 of 
,he United States Code and that such willful false statements may jeopardize the vahd.ty 
of the application or any patent issued thereon. 

Executed at Santa Monica, California on 31 October 2002. 



I 

L—t-^*± 

PiaM. Challita-Eid 
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Curriculum Vitae 
PlA M. CHALLITA-EID, PH.D 



^ Personal information 



Work address: 
Email: 

Home address: 



Agensys, Inc. 

1545 17 th Street 

Santa Monica, CA 90404 

pchallita@agensys.com 

15745 Morrison Street 
Encino, CA 91436 



Appointments: 



Group Leader, 
Research Scientist III 
Gene Discovery 

Research Scientist II 



Assistant Professor 
in Medicine, 
Microbiology & 
Immunology 

Senior Instructor 



Agensys, Inc. 

October 2001 -Present 

Agensys, Inc. 
August 2000-Present 

University of Rochester 
Cancer Center 

Hematology/ Oncology Unit 
July 1998- June 2000 

University of Rochester 
Cancer Center 
Department of Oncology 
January 1996- June 1998 



Education: 



B.S. Biology 



M.S. Microbiology 



American University of Beirut-Lebanon 
1984-1987 

American University of Beirut-Lebanon 
1987-1989 
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University of Southern California 
Department of Microbiology 
January 1990 -June 1994 

Donald B. Kohn, M.D., Associate, Professor 
Departments of Pediatrics and Microbiology 
Division of Research Immunology and Bone 

Marrow Transplantation 
Childrens Hospital of Los Angeles 
University of Southern California, California 
USA 

Postdoctoral fellowship University of California Los Angeles 

Department of Hematology-Oncology 
September 1994 - December 1995 

Advisor: Joseph D. Rosenblatt, M.D., Assistant Professor 

School of Medicine 

Department of Hematology-Oncology 
University of California, Los Angeles, California 



Students and Research Associates Mentored: 

Currently leading the Gene Discovery group of 6 research associates. Previous students and research 
associates mentored are listed beloxo. 

1. Skelton Diane, Research Associate, 1992-1994. 

2. El-Khoueiry Anthony, Undergraduate student, Summer 1992 and 1993. Currently Fellow at 
the USC Medical Center. 

3. Poles Tina, Research Associate, 1996-1998. 

4. Mosammaparast Nima, Undergraduate student, June 1996 - September 1997. Currently 
enrolled in Medical School. 

5. Zoric Bojan, -Undergraduate student, June 1997-June 1998. Currently enrolled in Medical 
School. 

6. Rimel BJ, Research Associate, June 1998-June 1999. 

7. Vicki Houseknecht, Research Associate, June 1999 - June 2000. 

8. Facciponte John, Graduate student in the Microbiology and Immunology Department at the 
University of Rochester, January 1998 - June 2000. Currently a graduate student at Roswell 
Park Cancer Center, Buffalo, NY. 

9. Kyung Yi, Graduate Student in Microbiology, January 1999 - June 2000. 

10. Anagha Joshi, Post-doctoral fellow, October 1999 - June 2000. 



Ph.D. Microbiology 



Advisor: 
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Patents: 

In the last year, I have been involved in tlie filing of greater than 40 applications. 

1) "Retroviral Vectors for Expression in Embryonic Cells", US5707865, issued date Jan. 13, 1998. 

2) "Chimeric Proteins for the Stimulation of a Tumor-Specific Immune Response", application 

in progress. 



Invited Presentations: 



October 1994 "Retroviral Vector Expression in Murine Stem Cells". Department of 

Hematology-Oncology, UCLA Gene Therapy Program, Los Angeles, California. 

October 1997 "Antibody Fusion Proteins for the Specific Recruitment and Activation of an 
Anti-Tumor immune Response". Childrens Hospital of Los Angeles, Los 
Angeles, California. 

February 1998 Regional Cancer Center Consortium for Biological Therapy. Roswell Park 
Cancer Institute, Buffalo, New York. 

July 1998 American Cyanamid Company. Lederle-Praxis Biologicals Division, Rochester, 
New York. 

October 1999 "Monoclonal Antibody Technology in the Era of Genetic Engineering" Brazilian 
Meeting on Biosafety and Transgenic Products, Rio De Janeiro, Brazil. 

June 1999 "Breast Cancer Research in the Era of Genetic Engineering", Breast Cancer 
Coalition of Rochester, Rochester, NY. 



Awards: 

Graduate Student Research Forum Award. Silencing of retroviral vectors after transduction of 
hematopoietic stem cells is associated with methylation. Graduate Student Research Forum 
Poster Session. USC Medical School, Los Angeles, California, 1993. 

Presidential Award. Society of Biological Therapy, Pasadena, California, October 1997. 
Merit Award. American Society of Clinical Oncology, California, May 1998. 



Grants/Funds: 

1) Jonsson Cancer Center Foundation/ UCLA 
Fellowship Seed Grant 

Title: "Antigen Processing in Human Neural Crest Tumors" 
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Effective Dates: 11/1/95-10/31/96 
Amount: $27,707 



Rochester Area Foundation 

Lucille B. Kesel Fund for the Advancenent of Cancer Research 

Title: "Antibody Fusion Proteins for Eradication of Minimal Residual Disease" 

Effective Dates: 1/1/98-12/31/98 

Amount: $8,000 

University of Rochester Cancer Center 
Interim and Pilot Project Funding 
P.I.: Joseph D. Rosenblatt, M.D. 
Co-P.I.: Pia M. Challita-Eid, Ph.D. 

Title: "Antibody Fusion Proteins for the Therapy of Cancer". 
Effective Dates: 1/1/98-12/31/98 
Amount: $25,000 

Sinsheimer Scholar Award 

Title: "Genetically-Engineered Chemokine Antibody Fusion Proteins for Breast and 
Ovarian Cancer Therapy" 
Effective Dates: 7/1/98-6/30/01 
Amount: $40,000/ year 

NIH/NCI 

P.I.: Joseph D. Rosenblatt, M.D. 

Co-P.I.: Pia M. Challita-Eid, Ph.D. . 
Title: "Recruitment and Activation of an Anti-tumor Response using Antibody-Fusi 

Proteins" 

Effective Dates: 12/1/98-11/30/03 
Amount: $191,046/ year 

NIH / NCI - Rapid Access to Intervention Development (RAID) 

Title: "Preclinical Development of a B7.1 Anti-HER2/neu Antibody Fusion Protein" 

Effective Date: Approved April, 1999 

Amount: Not applicable 

ACS Institutional grant 

Title: "Chemokine Directed Targeting of Cytotoxic TALL-104 Cells" 
Effective Dates: 9/1/99-8/30/00 
Amount: $8,000 

Breast Cancer Coalition of Rochester 
Title: "Breast Cancer Research" 
Date: 9/99 
Amount: $1,000 



Publications; 

Gersuk GM, Westermark B, Mohabeer AJ, Challita PM, Pattamakom S, and Pattengale, PK 

Inhibition of human natural killer cell activity by platelet-derived growth factor (PDGF). 
III. Membrane binding studies and differential biological effects of recombinant PDGF 
isoforms. ScandJ Immunol 33: 521-532, 1991. 

Gersuk GM, Carmel R, Challita PM, Rabinowitz AP, and Pattengale PK. Quantitative and 
functional studies of impaired natural killer (NK) cells in patients with myelofibrosis, 
essential thrombocytopenis, and polycythemia vera. I. A potential role for platelet- 
derived growth factor in defective NK cytotoxicity. Nat Immun 12: 136-151, 1993. 

Challita PM, and Kohn DB. Lack of expression from a retroviral vector in murine 

hematopoietic stem cells is associated with methylation in vivo. Proc Natl Acad Sci 
(USA) 91: 2567-2571, 1994. 

Krall W, Challita PM, Perlmutter L, Skelton D, and Kohn DB. Cells expressing human 

glucocerebrosidase from a retroviral vector repopulate macrophages and central nervous 
system microglia after murine bone marrow transplantation. Blood 83: 2737-2748, 1994. 

Challita PM, Skelton D, Yu XJ, El-Khoueiry A, Yu X-J, Weinberg KI, and Kohn DB. 

Multiple modifications in cis elements of the long terminal repeat of retroviral vectors 
leads to increased expression and decreased DNA methylation in embryomc carcinoma 
cells. J Virol 69: 748, 1995. 

Ucar K, Seeger RC, Challita PM,Watanabe CT, Yen TL, Morgan JP, Amado R, Chou E, 
McCallister T, Barber JR, Jolly DJ, Reynolds P, Gangavalli R, and Rosenblatt JD. 
Sustained cytokine production and immunophenotypic changes in human 
neuroblastoma cell lines transduced with a human gamma interferon vector. Cancer Gene 
Therapy 2: 171,1995. 

Lu Y, Planelles V, Palaniappan C, Li X, Challita-Eid PM, Amado R, Stephens D, Kohn DB, 

Bakker A, Day B, Bambara RA, and Rosenblatt JD. Inhibition of HIV-1 replication using 
a mutated tRNALys3 primer. / Biol Chem 272: 14523, 1997. 

Challita-Eid PM, Penichet ML, Shin SU, Poles T, Mosammaparast N, Mahmood K, Slamon DJ, 
Morrison SL, and Rosenblatt JD. A B7.1-antibody fusion protein retains antibody 
specificity and ability to 'activates the T cell costimulatory pathway. / Immunol 160: 
3419-3426, 1998. 

Challita-Eid PM, Abboud CN, Morrison SL, Penichet ML, Rosell KE, Poles T, Hilchey SP, 
Planelles V, and Rosenblatt JD. A RANTES- antibody fusion protein retains antigen 
specificity and chemokine function. / Immunology 161: 3729, 1998. 
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Challita-Eid PM, Rosenblatt JD, Day B, Rimel BJ and Planelles V. Inhibition of HIV-1 infection 
»' with a RANTES.IgG3 fusion protein. AIDS Research and Human Retroviruses lkl6Y7, 

1998. 

Mahmood K, Federoff HJ, Challita-Eid PM, Day B, Haltman M, Atkinson M, Planelles V, and 
Rosenblatt JD. Eradication of pre-established lymphoma using HSV amplicon vectors. 
Blood 93: 643, 1999 

Penichet ML, Challita PM, Shin S-U, Sampogna S, Rosenblatt JD, and Morrison SL. In vivo 

properties of three human HER2/neu-expressing murine cell lines in immunocompetent 
mice. Laboratory Animal Science 49: 179-88, 1999. 

Penichet ML, Dela Cruz JS, Challita-Eid PM, Rosenblatt JD, Morrison SL. A murine B cell 

lymphoma expressing human HER2/ neu undergoes spontaneous tumor regression and 
elicits antitumor immunity. Cancer Immunol Immunother 49:649-62, 2001. 

Hilchey SP, Rosebrough SF, Morrison SL, Rosenblatt JD, and Challita-Eid PM. Specific 
targeting and stimulation of in vivo anti-tumor response using a B7.1 T-cell 
costimulatory antibody fusion protein. Manuscript in preparation. 

Select Abstracts and Presentations: 

Challita PM, El-Khoueiry AB, and Kohn DB. Silencing of retroviral vectors after transduction 
of murine hematopoietic stem cells is associated with methylation. Blood 80 (10 Suppl. 1): 
168a, 1992. 

Challita PM, Cook C, Sender LS, and Kohn DB. Novel retroviral vectors for consistent 

expression after transduction into hematopoietic stem cells. Keystone Symposium 
on Gene Therapy, Keystone, Colorado, 1993. 

Challita PM. Retroviral vector expression in murine stem cells. Presentation. Division 
of Hematology-Oncology, University of California Los Angeles, October, 1994. 

Challita PM, Shin S-U, Penichet M, Mahmood K, Poles TM, Rosell KE, Abboud CN, Morrison 
SL, Rosenblatt JD. Novel Antibody Fusion Proteins for the Stimulation of a Tumor- 
Specific Immune Response. Keystone Symposium on Cellular Immunology and 
Immunotherapy of Cancer, Copper Mountain, Colorado, January 1997. 

Penichet ML, Challita PM, Shin S-U, Slamon DJ, Rosenblatt JD, and Morrison SL. In vivo 

properties of two human her2/neu expressing murine cell lines in immunocompetent 
mice. Mutlidisciplinary Approaches to Cancer Immunotherapy, Bethesda, Maryland, 
June 1997. 
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Challita PM, Abboud CN, Rosell KE, Penichet ML, Poles T, Mahmood K, Morrison SL, and 
Rosenblatt JD. Characterization of a RANTES-antibody fusion protein for cancer 
immunotherapy. Mutlidisciplinary Approaches to Cancer Immunotherapy, Bethesda, 
Maryland, June 1997. 

Horwitz S, Rosenblatt JD, Mosammaparast N, Poles T, Abboud CN, and Challita PM. Gene- 
modified EL4 cells expressing the chemokine RANTES protects from tumor growth and 
stimulates an anti-tumor cytotoxic T-lymphocyte response in vivo. Mutlidisciplinary 
Approaches to Cancer Immunotherapy, Bethesda, Maryland, June 1997. 

Challita-Eid PM, Morrison SL, Penichet ML, Rosenblatt JD. Antibody-T cell costimulatory 
ligand fusion protein for the stimulation of a specific anti-tumor immune response. 
American Society of Hematology, San Diego, California, December 1997. 

Challita-Eid PM, Abboud CN, Penichet ML, Rosell KE, Morrison SL, Rosenblatt JD. Antibody 
fusion proteins for the recruitment and activation of an anti-tumor immune response. 
American Association for Cancer Research. New Orleans, Louisiana, March 1998. 

Challita-Eid PM, Hilchey Shannon P., and Rosenblatt Joseph D. An anti-HER2/neu RANTES 
fusion protein induces effector cell infiltration to the site of HER2/ neu expressing 
tumors. AACR/NCI/EORTC Molecular Targets and Cancer Therapeutics, Washington 
DC, November 1999. 

Facciponte JG, Rosenblatt JD, H.J.Federoff HJ, Challita-Eid PM. Herpes simplex virus (HSV) 

amplicon-mediated gene transfer of tumor associated antigens into bone marrow derived 
dendritic cells. Keystone Symposium on Cellular Immunity and Immunotherapy of 
Cancer, Santa Fe, New Mexico, January 2000. 
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Curriculum Vitae 
PIA M- CHALLITA-EID, PH.D 



Personal information 



Work address: 
Email: 

Home address: 



Agensys, Inc. 

1545 17 th Street 

Santa Monica, CA 90404 

pchallita@agensys.com 

15745 Morrison Street 
Encino, CA 91436 



A ppointments: 

Group Leader, Agensys, Inc. 

Research Scientist III 

Gene Discovery October 2001-Present 

Research Scientist II Agensys, Inc. 

August 2000-Present 

University of Rochester 
Cancer Center 

Hematology/Oncology Unit 
July 1998- June 2000 

University of Rochester 
Cancer Center 
Department of Oncology 
January 1996- June 1998 



Assistant Professor 
in Medicine, 
Microbiology & 
Immunology 

Senior Instructor 



Education: 

B.S. Biology American University of Beirut-Lebanon 

1984-1987 

M.S. Microbiology American University of Beirut-Lebanon 

1987-1989 
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University of Southern California 
Department of Microbiology 
January 1990 - June 1994 

Donald B. Kohn, M.D., Associate, Professor 
Departments of Pediatrics and Microbiology 
Division of Research Immunology and Bone 

Marrow Transplantation 
Childrens Hospital of Los Angeles 
University of Southern California, California 
USA 

Postdoctoral fellowship University of California Los Angeles 

Department of Hematology-Oncology 
September 1994 - December 1995 

Advisor: Joseph D. Rosenblatt, M.D., Assistant Professor 

School of Medicine 
Department of Hematology-Oncology 
University of California, Los Angeles, California 



Ph.D. Microbiology 
Advisor: 



Students and Research Associates Mentored: 

Currently leading the Gene Discovery group of 6 research associates. Previous students and research 
associates mentored are listed below. 

1. Skelton Diane, Research Associate, 1992-1994. 

2. El-Khoueiry Anthony, Undergraduate student, Summer 1992 and 1993. Currently Fellow at 
the USC Medical Center. 

3. Poles Tina, Research Associate, 1996-1998. 

4. Mosammaparast Nima, Undergraduate student, June 1996 - September 1997. Currently 
enrolled in Medical School. 

5. Zoric Bojan, Undergraduate student, June 1997-June 1998. Currently enrolled in Medical 

School. 

6. Rimel BJ, Research Associate, June 1998-June 1999. 

7. Vicki Houseknecht, Research Associate, June 1999 - June 2000. 

8 Facciponte John, Graduate student in the Microbiology and Immunology Department at the 
University of Rochester, January 1998 - June 2000. Currently a graduate student at Roswell 
Park Cancer Center, Buffalo, NY. 

9. Kyung Yi, Graduate Student in Microbiology, January 1999 - June 2000. 

10. Anagha Joshi, Post-doctoral fellow, October 1999 - June 2000. 
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Patents: 

' In the last year, I have been involved in the filing of greater than 40 applications. 

» 1) "Retroviral Vectors for Expression in Embryonic Cells", US5707865, issued date Jan. 13, 1998. 
2) "Chimeric Proteins for the Stimulation of a Tumor-Specific Immune Response", application 
in progress. 



Invited Presentations: 

October 1994 "Retroviral Vector Expression in Murine Stem Cells" . Department of 

Hematology-Oncology, UCLA Gene Therapy Program, Los Angeles, California. 

October 1997 " Antibody Fusion Proteins for the Specific Recruitment and Activation of an 
Anti-Tumor immune Response". Childrens Hospital of Los Angeles, Los 
Angeles, California. 

February 1998 Regional Cancer Center Consortium for Biological Therapy. Roswell Park 
Cancer Institute, Buffalo, New York. 

July 1998 American Cyanamid Company. Lederle-Praxis Biologicals Division, Rochester, 
New York. 

October 1999 "Monoclonal Antibody Technology in the Era of Genetic Engineering" Brazilian 
Meeting on Biosafety and Transgenic Products, Rio De Janeiro, Brazil. 

June 1999 "Breast Cancer Research in the Era of Genetic Engineering", Breast Cancer 
Coalition of Rochester, Rochester, NY. 

Awards: 

Graduate Student Research Forum Award. Silencing of retroviral vectors after transduction of 
hematopoietic stem cells is associated with methylation. Graduate Student Research Forum 
Poster Session. USC Medical School, Los Angeles, California, 1993. 

Presidential Award. Society of Biological Therapy, Pasadena, California, October 1997. 
Merit Award. American Society of Clinical Oncology, California, May 1998. 



Grants/Funds: 

1) Jonsson Cancer Center Foundation/ UCLA 
Fellowship Seed Grant 

Title: "Antigen Processing in Human Neural Crest Tumors" 
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Effective Dates: 11/1/95-10/31/96 
Amount: $27,707 

2) Rochester Area Foundation 

Lucille B. Kesel Fund for the. Advancenent of Cancer Research 

Title: "Antibody Fusion Proteins for Eradication of Minimal Residual Disease" 

Effective Dates: 1/1/98-12/31/98 

Amount: $8,000 

3) University of Rochester Cancer Center 
Interim and Pilot Project Funding 
P.I.: Joseph D. Rosenblatt, M.D. 
Co-P.I.: Pia M. Challita-Eid, Ph.D. 

Title: "Antibody Fusion Proteins for the Therapy of Cancer". 
Effective Dates: 1/1/98-12/31/98 
Amount: $25,000 

, 4) Sinsheimer Scholar Award 

Title: "Genetically-Engineered Chemokine Antibody Fusion Proteins for Breast and 
Ovarian Cancer Therapy" 
Effective Dates: 7/1/98-6/30/01 
Amount: $40,000/ year 

5) NIH/NCI 

P.I.: Joseph D. Rosenblatt, M.D. 
Co-P.I.: Pia M. Challita-Eid, Ph.D. 

Title: "Recruitment and Activation of an Anti-tumor Response using Antibody-Fusion 
Proteins" 

Effective Dates: 12/1/98-11/30/03 
Amount: $191,046/ year 

6) NIH/NCI - Rapid Access to Intervention Development (RAID) 

Title: "Preclinical Development of a B7.1 Anti-HER2/neu Antibody Fusion Protein" 
Effective Date: Approved April, 1999 
Amount: Not applicable 

7) ACS Institutional grant 

Title: "Chemokine Directed Targeting of Cytotoxic TALL-104 Cells" 
Effective Dates: 9/1/99-8/30/00 
Amount: $8,000 

8) Breast Cancer Coalition of Rochester 
Title: "Breast Cancer Research" 
Date: 9/99 

Amount: $1,000 
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Gersuk GM, Westermark B, Mohabeer AJ, Challita PM, Pattamakom S, and Pattengale, PK. 

Inhibition of human natural killer cell activity by platelet-derived growth factor (PDGF). 
III. Membrane binding studies and differential biological effects of recombinant PDGF 
isoforms. Scand } Immunol 33: 521-532, 1991. 

Gersuk GM, Carmel R, Challita PM, Rabinowitz AP, and Pattengale PK. Quantitative and 
functional studies of impaired natural killer (NK) cells in patients with myelofibrosis, 
essential thrombocytopenis, and polycythemia vera. I. A potential role for platelet- 
derived growth factor in defective NK cytotoxicity. Nat Immun 12: 136-151, 1993. 

Challita PM, and Kohn DB. Lack of expression from a retroviral vector in murine 

hematopoietic stem cells is associated with methylation in vivo. Proc Natl Acad Sci 
(USA) 91: 2567-2571, 1994. 

Krall W, Challita PM, Perlmutter L, Skelton D, and Kohn DB. Cells expressing human 

glucocerebrosidase from a retroviral vector repopulate macrophages and central nervous 
system microglia after murine bone marrow transplantation. Blood 83: 2737-2748, 1994. 

Challita PM, Skelton D, Yu XJ, El-Khoueiry A, Yu X-J, Weinberg KI, and Kohn DB. 

Multiple modifications in cis elements of the long terminal repeat of retroviral vectors 
leads to increased expression and decreased DN A methylation in embryonic carcinoma 
cells. / Virol 69: 748, 1995. 
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Sustained cytokine production and immunophenotypic changes in human 
neuroblastoma cell lines transduced with a human gamma interferon vector. Cancer Gene 
Therapy 2: 171, 1995. 

Lu Y, Planelles V, Palaniappan C, Li X, Challita-Eid PM, Amado R, Stephens D, Kohn DB, 

Bakker A, Day B, Bambara RA, and Rosenblatt JD. Inhibition of HIV-1 replication using 
a mutated tRNALys3 primer. / Biol Chem 272: 14523, 1997. 

Challita-Eid PM, Penichet ML, Shin SU, Poles T, Mosammaparast N, Mahmood K, Slamon DJ, 
Morrison SL, and Rosenblatt JD. A B7.1-antibody fusion protein retains antibody 
specificity and ability to activated the T cell costimulatory pathway. / Immunol 160: 
3419-3426, 1998. 

Challita-Eid PM, Abboud CN, Morrison SL, Penichet ML, Rosell KE, Poles T, Hilchey SP, 
Planelles V, and Rosenblatt JD. A RANTES- antibody fusion protein retains antigen 
specificity and chemokine function. / Immunology 161: 3729, 1998. 
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with a RANTES.IgG3 fusion protein. AIDS Research and Human Retroviruses 14:1617, 
1998. 
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Rosenblatt JD. Eradication of pre-established lymphoma using HSV amplicon vectors. 
Blood 93: 643, 1999 

Penichet ML, Challita PM, Shin S-U, Sampogna S, Rosenblatt JD, and Morrison SL. In vivo 

properties of three human HER2/neu-expressing murine cell lines in immunocompetent 
mice. Laboratory Animal Science 49: 179-88, 1999. 

Penichet ML, Dela Cruz JS, Challita-Eid PM, Rosenblatt JD, Morrison SL. A murine B cell 

lymphoma expressing human HER2/ neu undergoes spontaneous tumor regression and 
elicits antitumor immunity. Cancer Immunol Immunother 49:649-62, 2001. 

Hilchey SP, Rosebrough SF, Morrison SL, Rosenblatt JD, and Challita-Eid PM. Specific 
targeting and stimulation of in vivo anti-rumor response using a B7.1 T-cell 
costimulatory antibody fusion protein. Manuscript in preparation. 



Select Abstracts and Presentations: 

Challita PM, El-Khoueiry AB, and Kohn DB. Silencing of retroviral vectors after transduction 
of murine hematopoietic stem cells is associated with methylation. Blood 80 (10 Suppl. 1): 
168a, 1992. 

Challita PM, Cook C, Sender LS, and Kohn DB. Novel retroviral vectors for consistent 
expression after transduction into hematopoietic stem cells. Keystone Symposium 
on Gene Therapy, Keystone, Colorado, 1993. 

Challita PM. Retroviral vector expression in murine stem cells. Presentation. Division 
of Hematology-Oncology, University of California Los Angeles, October, 1994. 

Challita PM, Shin S-U, Penichet M, Mahmood K, Poles TM, Rosell KE, Abboud CN, Morrison 
SL, Rosenblatt JD. Novel Antibody Fusion Proteins for the Stimulation of a Tumor- 
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A reassessment of the translation initiation codon in 
vertebrates 

Suraj Peri and Akhilesh Pandey 



More than two decades ago Marilyn Kozak 
proposed the scanning model of translation 
initiation, whereby translation is initiated at 
the first AUG codon that is in a particular 
context In this article, we re-examine the 
context of initiator codons using a large 
dataset of curated human transcripts. We 
find that more than 40% of transcripts 
contain AUG codons upstream of the actual 
start codon and that most authentic AUGs 
contain three or more mismatches from the 
consensus sequence, CCACCaugG. Also, 
in a large fraction of transcripts, the 
sequences surrounding the initiator codon 
deviate more from the consensus than 
those surrounding upstream AUGs, 
indicating that translation initiation from 
downstream AUGs is more common than 
generally believed. 

I n this article, we re-examine the position 
requirement and the context dependence 
for an AUG codon to be used for 
translation initiation as proposed by 
Kozak 1 . The evidence in favor of the first- 
AUG rule is mainly derived from the fact 
that the first AUG was used in about 
90-95% of cases in a study of several 
hundred vertebrate mRN As (Ref. 2). The 
context refers to the nucleotide sequence 
surrounding the AUG (generally 
accepted as being CCACCaugG) 3 - 4 . This 
'consensus' has been derived from 
observing the sequences surrounding 
the first AUG in exons as well as by 
mutagenesis studies. Although these 
mutagenesis studies provide data on 
what might constitute an 'optimal' 
sequence, they do not accurately predict 
whether a given AUG found in a natural 
mRN A transcript is likely to encode the 
initiator meth ion ine. 

Asa result of the complete sequencing 
of the human genome, tens of thousands 
of novel genes have been identified, and 
their translation initiation sites need to be 
predicted 5 - 6 . Therefore, we decided to take 
another look at two of the major issues 
that underlie the scanning model of 
translation initiation. For our analysis, 
we have worked exclusively with well- 
characterized and annotated human 
mRN A sequences from the reviewed 



Ref Seq dataset 7 . Ref Seq is a project by 
the National Center for Biotechnology 
Information (NCBI) to create curated 
non redundant entries for each gene and 
contains predicted, provisional and 
reviewed entries for mRNAs and proteins. 
Reviewed entries are those where a 
human curator has performed an 
extensive manual verification, and by 
using these entries, we hope to avoid the 
annotation errors that otherwise abound 
in databases 8 . 

It has been proposed that certain 
classes of molecules, such as oncogenes, 
growth factors and receptors, are 
translated poorly and could contain a 
higher frequency of upstream AUG 
codons in their mRNAs as a mode of 
regulation 2 . We therefore subdivided 
our dataset into two classes: transcripts 
encoding cytosolic molecules, and those 
encoding proteins that are secreted or 
bound to the plasma membrane (i.e. those 
products with signal peptides). We first 
examined the nucleotide sequences 
surrounding their established initiator 
codons. Second, we inspected whether 
any AUGs exist upstream of the actual 
initiator codon in the 5' untranslated 
regions (UTR) of these mRNAs. Last, 
because we found a significant number 
of transcripts with an upstream AUG, 
we examined whether there is any 
relationship between the size of the 
reported upstream 5' UTR and the 
number of unused upstream AUGs. 

Initiator codons in transcripts encoding 
cytosolic proteins 

We studied the sequence context of 
AUGs from a dataset of 1 534 reviewed 



transcripts encoding cytoplasmic 
proteins; the observed frequencies are 
shown in Table 1 . When considered 
individually, the nucleotides that form 
the consensus CCACCaugG are found in 
32-53% of transcripts. When only -3 and 
+4 positions are examined, only 46% of 
transcripts contain a purine (A or G) at 
-3 and a G at +4. Thus, over half of the 
transcripts differ from what are believed 
to be the most conserved nucleotide 
positions (-3 and +4) surrounding the 
AUG. We did not find that specific 
nucleotides occurred at position -5 with 
significantjy higher frequency than 
random (P-va lue < 0.05). The degree of 
deviation of individual sequences from 
the consensus was also calcu lated and is 
discussed below. 

Initiator codons in transcripts encoding 
cytokines, growth factors and receptors 
The assignment of an AUG as an initiator 
methionine on the basis of genom ic 
sequences can be quite contentious. 
Even when a protein sequence derived 
from a cloned cDNA is used, there can 
be disagreements on several issues. 
Therefore, we have taken a biological 
approach. Signal peptides are found at 
the amino termin i of secreted factors such 
as cytokines and growth factors, as wel I 
as of type I transmembrane receptors 
, that have their amino terminus located 
extracellular ly 9 - 10 . These peptides are 
approximately 1 5-30 amino acids long 
and contain a stretch of hydrophobic 
amino acids. Several excellent programs 
are available that predict both the 
presence of signal peptide and the 
cleavage site 11 - 12 . Because signal peptides 



Table 1. Frequency of nucleotides surrounding the initiator codon of transcripts encoding 
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•Sequences surrounding the initiator codon of 1534 manually reviewed RefSeq transcripts encoding cytoplasmic 
proteins. The frequency of occurrence of indicated nucleotides surrounding the initiator codon at positions -5 to 
+4 with respect to ATG is shown. , 
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Fig. 1. Analysis of sequence contexts surrounding initiator codons and upstream unused AUGs. (a) Mismatch 
frequency of the nucleotides surrounding the initiator codon observed in natural transcripts as compared with the 
Kozak's consensus (CCACCaugG). This dataset is composed of 1534 transcripts encoding cytoplasmic proteins. 
The random probability of occurrence of no, one. two. three, four, five or six mismatches is 0.02%. 0.4%. 3.3%, 
13.2%, 29.7%. 35.6% and 17.8%, respectively, (b) Comparison of the contexts of upstream unused AUGs with the 
authentic AUG, when CCACCaugG was considered as the optimal context (from a total of 2195 RefSeq transcripts). 
Yellow, the percentage of transcripts that contained at least one upstream AUG with fewer mismatches than the 
authentic AUG (i.e. the upstream AUG is in a more favorable context); green, the percentage of transcripts where 
all upstream AUGs were in a similar context 0ndicated by the same degree of mismatch from the consensus); 
orange, all other transcripts, (c) Comparison of the contexts of upstream unused AUGs with the authentic AUG 
when a purine (A orG) at -3 and a G at+4 was considered as the optimal context (from a total of 2195 RefSeq 
transcripts). The color code is the same as in (b). 



can be recognized easily in these classes 
of proteins, assignment of the actual 
initiator methionine is obvious. 
Additionally, thecDNAsof most of these 
genes have been expressed in cells, 
va I idating the assignment of the signal 
peptides functionally 

We therefore compiled a list of 
661 cytokines, growth factors and 
receptors, and then tabulated the 
nucleotide sequences surrounding the 
initiator methionine residue from 
transcripts encoding these proteins. As 
in the case above, only 41 .8% of the 
transcripts contained a purine at -3 
and a G at +4 (data not shown). The 
observations from this dataset essential ly 
paralleled the results shown inTablel, 
indicating that cytokines and growth 
factors do not contain_any atypical motifs. 

Frequency of initiator codons is not in 
agreement with the theoretical consensus 
'CCACCaugG' 

We next decided to investigate how often 
a real initiator methionine from our 
dataset is in agreement with the 
consensus and to express any deviation 
from the consensus as the number of 
mismatches observed. I f the surrounding 
sequences were almost or entirely 
identical to Kozak's consensus, then one 
would expect to find most proteins with no 
or a single mismatch. However, if they 
were more randomly distributed, then the 



number of mismatches would be around 
three or four (because having exactly five 
or six mismatches is more constrained in 
terms of probability). Interestingly, we 
found that only -24% of transcripts 
encoding cytosolic proteins had two or less 
mismatches compared with the consensus 
(Fig. 1a). This implies that a majority of 
transcripts contain initiator codons that 
are not in close agreement with Kozak's 
consensus sequence. The same 
phenomenon was seen when proteins 
with signal peptides were considered 
(data not shown). 

Frequency of upstream AUGs 
To determine how often the most 5' AUG 
is used, we decided to inspect the 
transcripts for the presence of AUGs 
that were upstream of the initiator 
methionine. Here, we expected there 
would be no upstream AUGs in most of 
these cases. However, again to our 
surprise, we found that only slightly more 
than half of the transcripts contained no 
upstream AUG. In fact, 41% of transcripts 
had one or more, and 24% of genes had 
two or more upstream AUGs (data not 
shown). This means that, whatever the 
reason, the second, third or a further 
downstream AUG is chosen for 
translation initiation in these cases. Of 
course, if one were to assign the first AUG 
as the initiator methionine in these 
transcripts, the predicted open reading 



frame (ORF) and the length of the 
encoded protein would be erroneous. The 
lack of any significant difference in the 
distribution of cytosolic proteins and 
those with signal peptides indicates that 
the class of proteins coded for by the 
mRNAs is not a reliable indicator of 
atypical behavior of mRNAs. 

It has been argued that it is the first 
AUG with a favorable context that is 
used for translation initiation. Therefore, 
we decided to compare the contexts of 
upstream unused AUGs with that of the 
authentic AUG in two ways. In the first 
method, we ca Iculated the degree of 
mismatch for each of the upstream AUGs 
from the consensus, CCACCaugG, and 
compared it with the degree of m ismatch 
of the authentic AUG. We divided the 
transcripts into three groups based on 
these results: (1) those that contained 
at least one upstream AUG in a more 
favorable context; (2) those where each 
of the upstream AUGs had a similar 
context; and (3) those where either all the 
upstream AUGs were in a less favorable 
context or some were in a less favorable 
context and others a similar context. 
In the second method, we divided the 
transcripts according to the degree of 
mismatch from a motif in which only 
two positions were considered as 
optimal; that is, a purine at -3 and a 
G at +4. The transcripts were again 
divided into three groups as in the first 
method. The object of this compar ison 
was to identify the number of cases in 
which either a more favorableor similar 
AUG codon existed upstream of the 
authentic AUG. These two groups 
(categories 1 and 2, above) would 
therefore represent transcripts in 
which the first AUG that had a 
favorable context was not chosen for 
translation initiation. 

The results according to the first 
method show that 35% of transcripts 
contained at least one upstream AUG 
that was in a more favorable context 
than the actual initiator, with 1 2% 
of transcripts containing upstream 
AUGs in a similar context to the 
authentic AUG (Fig. 1b). Figure 1c shows 
essentially similar results when only a 
purine at -3 and a guanine at +4 were 
considered as the optimal motif 
(according to the second method). Our 
analysis therefore reveals that in almost 
half the cases, there was at least one 
upstream unused AUG codon that was in 
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Fig. 2. The length of 5' untranslated region {UTR) in 
basepairs plotted against the number of transcripts 
having no, one or > two AUGs upstream of the actual 
translation start cod on. 2195 RefSeq transcripts were 
analyzed. No significant difference was found between 
da tasets of transcripts encoding cytoplasmic proteins 
and those with signal peptides. 



a similar or better context than the 
authentic AUG. 

The length of the 5' untranslated region 
Is related to the number of upstream 
unused AUGs 

While we were performing this analysis, we 
were intrigued by the fact that if the 5' UTR 
was long, the transcripts invariably 
contained upstream AUGs that were not 
used. We therefore decided to investigate 
this systematical ly Figure 2 shows a 
histogram of the number of transcripts 
with no, one and > two upstream AUGs 
plotted against the length of the 5' UTR. 
Most of the transcripts (85%) with 1 00 bp 
or less of 5' UTR sequence do not conta i n 
any upstream AUGs, and only 2.6% contain 
two or more upstream AUGs. Quite the 
opposite is observed for transcripts with 
5' UTRs longer than 300 bp. In this case, 
70% of the transcripts contain two or more 
upstream AUGs, and only 1 3.5% contained 
no upstream AUGs. As the length of the 
5' UTR increases, the number of 
transcripts with no upstream AUGs 
decreases and the number of transcripts 
with unused upstream AUGs increases. We 
see the same phenomenon in both classes 
of our dataset, indicating that it is not 
dependent on the type of protein being 
studied. Considering that the average 
length of the 5' UTR for genes in the human 
genome is 300 bp (Ref. 6), our data suggests 
that one has to be quite carefu I with the 
first AUG'rule because it is probable that 
the first AUG is not being used in a 
significant number of transcripts. 



Conclusions 

Our analysis essentially focused on 
testing whether there is any consensus 
around the initiator codon in transcripts 
encoding known proteins. Transcripts 
that encode we 1 1 -studied proteins provide 
a more applicable dataset, as these 
proteins are not predictions and have 
been worked on by scores of investigators 
worldwide. They are also good candidates 
to test the predictive value of Kozak's 
criteria when considering assignment of 
a given AUG in the transcript of a newly 
discovered gene. Our analysis shows that 
a large number of transcripts contain 
AUGs upstream of the actual translation 
start site, many of which are in a more 
favorable context than the codon used 
for translation initiation. Furthermore, 
our data shows that most of the AUGs 
used for translation initiation deviate 
significantly from Kozak's consensus. It 
is possible that mechanisms such as 
leaky scanning, reinitiation or internal 
initiation of translation have a much 
greater role than previously 
imagined 13-16 . In support of this idea, 
a growing number of transcripts have 
recently been reported to undergo 
internal initiation 17 " 19 . For the purposes 
of gene prediction and identification of 
translation start sites from genomic DN A 
or cDNA sequences, it is better to use 
homology-based alignments across 
protein families or across species to 
identify the initiator codon correctly, 
instead of relying solely on the most 
upstream AUG and its context. 
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1 Introduction 



The control of gene expression is achieved by a series 
of complex mechanisms which can be divided into 
two basic phases. The fust phase, which involves the 
processing of sequence information from DNA, through 
transcription, RNA splicing, and transport through the 
nuclear membrane to yield a mature mRNA, has been 
relatively well characterized for many genes ™o»gh 
nucleic add sequencing approaches. The second phase, 
involving translation into protein (dependent on mRNA 
translatability), folding, assembly into multimers, trans- 
port to an appropriate subcellular location, post-transta- 
tional modifications, and final destruction, has been 'ess 
comprehensively characterized. Both phases are likely to 
contain important control points associated with gene 
regulation underlying differentiation, disease processes 
and drug effects/For a variety of reasons, it would be 
useful to know the extent to which mRNA abundances 
are predictive or corresponding protein abundances. A 
series or powerful methodologies, including Transcript 
Imaging fJJ. SAGE |2J, differential display [3J and 1 array 
hybridization [4-6], have been developed to detect and 
in some cases quanlilate differences in mRNA . composi- 
tion between different samples. In parallel, high resolu- 
tion protein mapping systems, based on two-dmen- 
sional (2-D) electrophoresis (71. have been employed to 
build Quantitative databases describing gene expression 
at the protein level 18-11]. By combining these ^ap- 
proaches, it is possible for the first time to examine both 
levels al which gene expression is controlled, and 



A comparison of selected mRNA and protein 
abundances in human liver 

in order to obtain an estimate of the overall level of correlation between 
mRNA and protein abundances for a well-characterized pharmaceu i icaUy ele- 
vant biological system, we have analyzed human liver by quantitative two- 
dimensional electrophoresis (for protein abundances) and by Transcript Image 
Oology for mRNA abundances). Incyte's LifeSeq~ database was sear- 
ched for expressed sequence tag (EST) sequences corresponduig^ *° 4 *nes of 
23 proteins identified on 2-D maps in the Urge Scale Biology (ISB) Molec- 
ular Anatomy" database, resulting in estimated abundances for 19 messages 
(4 were undetected) among 7926 liver clones sequenced. A correlation coelli- 
cielt ofO.48 was obtained between the mRNA and protein abundances deter- 
mined by the two approaches, suggesting that pc^t-lranscnptional 
of gene expression is a frequent phwomaion in higher organisms. A com- 
parison with published data (Kawamoto, S.,<, el. Gen, . 1996 W J£ 
Oit abundances of liver mRNAs for plasma proteins (secreted by ^iver) 
suggests that higher abundance messages are strongly ennch ec * n 
seouences Our data confirms this: of the 50 most abundant liver mRNAs, 29 
coded ^ secreted proteins, while none of the 50 most abundant proteins 
appeared to be secreted products (although four plasma and red bl<J«U 
proteins were present in this group as contaro.nants from tissue blood). 

thereby to develop a global understanding of gene 
expression control. 
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To date, we are aware of surprisingly Utile published 
work on the overall relationship or message and protein 
abundance, with the exception of a recent study by 
Kawamoto el al. (12). comparing mRNA levels obtained 
for plasma protein genes by transcript image methodol- 
ogy with the abundances of Ihe corresponding plasma 
proteins in circulation. This report appeared to show a 
strong correlation between mRNA and protein abun- 
dance, based on data for nine human gene products. It 
seemed likely, however, that such secreted proteins con- 
stitute a special case, since they are rapidly delivered 
from the cell of synthesis to the plasma compartment, 
where many of the mechanisms that regulate cellular 
protein abundance are presumably absent. We tbers." 
dedded to compare mRNA and protein levels Tor a 
larger series of cellular ruolecules in order to see 
whether a simple relationship exists between mRNA and 
protein abundance for this class, and to see whether 
mRNAs for major cellular proteins are generally more or 
less abundant than those for major secreted products. 

2 Materials and methods 

Samples for 2-D electrophoresis were prepared by 
rapidly mixing a frozen powder of human liver (prepared 
and stored at liquid nitrogen temperature in toe 
National Biomonitoring Specimen Bank at Incus 
National Institute of Standards and Technology) with an 
8-fold excess of 9 u urea, 2% NP-40, 1% mercaplo- 
ethanol and 2% carrier ampholytes (LKB 9-11). Ten uL 
of the resulting sample was analyzed using Ihe Iso- 
DA1T 2-D electrophoresis system, and the gels stained 
with colloidal Coomassie Brilliant Blue (CBB) G-25 > as 
previously described (13-16]. Each stained slab gel was 
digitized in red light at 134 urn resolution u«ng an 
Eikonix 1412 scanner and the digitized gel images P ro- 
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Tahlc 1. I'roicin ami mRNA abundances in human liver reported for 33 sclcged molecule* 



Protein name 



Carbamyl phosphate 

synthase 
Actio beta 

Heat shod protein 60 
Protein disulfide isomcrase 
78 KD glucose regulated 

protein /BIP 
GtlreLiculin 
Fl ATPasc beu 
Actio {am ma 
Hcit shoct cognate 70 
Cytochrome 85 
Endoplisroin 
75 KD glucose regulated 

protein 
Pyruvate carboxylase 
Heat shock protein 70 
Tubulin beta 1 
Vtmenltn 
Tropomyosin 
HADPH cytochrome 

P-450 reductase 
Tubulin alpha I 
Heal shock protein 90 
Cytochrome oxidase I! 

(mil encoded) 
Laminln receptor 
Lamtn B 



Protein 


Average protein 


Protein standard 


Average protein 




abundance 


deviation 


abundance O) 


CPS 


1 0M75 


12379 


3.83 


ACTD 


5034$ 


17793 


1.41 


HS P60, 


37656 


1939 


1.05 


PDI 


31260 


1942 


0.87 


BIP 


31050 


1993 


0.87 






2076 


0.85 


FIATPB 


29529 


1275 


0.82 


ACTG 


23316 


9013 


0.65 


HSC70 


21647 


908 


O.&O 


CYB5 


l«T76 


1656 


052 


EN PL 


17817 


5429 


050 


CR75 


163*0 


1531 


0.46 


PYVC 


14655 


1930 


0.41 


HSP70 


1639 


1565 


024 


TBBl 


7125 


1472 


020 


VI ME 


wo? 


952 


0.18 


TPM 


4090 


600 


Oil 


NP4S0R 


3303 


1319 


0.09 


TBA1 


3057 


1409 


0.09 


HSP90 


2740 


597 


0.08 


COX-1I 


2384 * 


651 


O07 


LAMR 


1531 


602 


0.04 


LAMB 


1454 


371 


O04 



Kumbc of clones 
BLAST) 



Average message 
abundance 



O.tfo* 

0.119% 
0.038% 
0.025% 
0.013% 

0.038% 
0.038% 
0.215% 
0.013% 
0.088% 
0.063% 
0.013% 

Not detected 
0.013% 
0.038% 
not detected 
0.013% 
Not detected 

0.063% 
0.025% 
Not measured 

0.050% 
0.025% 



a) Protein ibundaAce Is given in plxel-«ray levels (the integrated CBB optical density of the appropriate 
tiplc spots comprisiae a single gene product have been summed. Mettcnger RKA measurements 
number of clones sequenced In the relevant transcript images. 



spot or spots on a 2-D gel), where mul- 
are given, as a ferctntage of the toUl 
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cessed using the Kepler* software system (Large Scale 
Biology) to give protein abundances in terras of pixel X 
gray-level values, as well as group average abundances 
and standard deviations over a set of seven male human 
livers. Relative abundances were computed by dividing 
individual average abundances by the average total abun- 
dance of the proteins resolved on the gels. A series of 
proteins was identified on these gels based on dose 
homology with, identified rodent Hver spots and on iden- 
tifications published by Hughes tt aL [17J. Total cellular 
RNA was extracted from samples of human liver tissue 
by the method of Chirgwin tx aL 118], and poly-At RNA 
was prepared by hybridization to oligo-dT cellulose. Five 
ug of poly-A+ RNA was used to construct a cDNA 
library using the Gubler and Hofiman method [19] in 
bocteriophage-lambda UNIZap~ (Stralagenc Inc., La 
Jolla, CA).The library was converted to plasmid DNA by 
bulk excision, and individual colonies were selected lor 
DNA template preps. The templates were sequenced 
enzymatically (Sanger tt aL [20D on an ABI 373 auto- 
mated DNA sequencer. Temptates considered sequenced 
sucessfully contained > 230 bases of cDNA insert 
sequence after removal or repetitive and low informaUon 
sequences, > 90% base call accuracy, and were not of 
mitochondrial, vector or host origin. Resulting DNA, 
sequences were analyzed using the BLAST program for 
similarity with other known primate, mammalian, and 
subsequently all divisions of GenBank. Similarity data 
was stored and tabulated in the UfeSeq* software 
(Incyte, Palo Alto, CA), from which relative fractions of 
specific gene products present within the starting RNA 



prep were calculated as follows^ % abundance = # clones 
representing each gene / loud # of genes sampled *10O. 
A total of 7925 clones were sequenced from liver ob- 
tained from two individuals: one male (5054 dones) and 
one female (2S71 clones). Data from Table 1 of Kawa- 
moto er oL [121, was replotted using protein abundances 
for human plasma proteins taken as mean values of the 
range presented in reference [21]. An error in the abun- 
dance of the haptoglobin els polypeptide (which was 
assumed in [12] to account for the entire abundance of 
the haptoglobin a,ft tetramer) was corrected. 



3 Results 

Protein and mRNA abundance data were collected for a 
set of gene products identified on 2-D gels (Table I). 
Standard deviations of the protein measurements across 
six individual livers were relatively low, averaging 19 of 
the mean abundance. Of the 23 selected proteins, 
raRNAs for 19 were detected in human liver transenpt 
images. Of these 19, Ove were represented by 1 clone, 
three by 2 dones, four by 3 dones, and the rest by 
between 4 and 17 dones. Of the four Bene products 
undetected at the mRNA level, one (cytochrome oxidase 
subunit II: COX-II) was deleted from the Transnpl 
Image dataset during standard initial sequence data 
workup, which removes all mitochondrial sequences; A 
plot of protein abundance (expressed as integrated Coo- 
massie Blue absorbance averaged over seven individual 
livers) wsus mRNA abundance" (expressed as per- 
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ACTS 



*crc 

• • ere 

CTBS 



CM* 



Him 



rrvc 



1-0 H» 

/W J. A log-teg pM or Ihe abundances of etch of 23 gene products 
it the protein level (X-axis) tod mRNA level (YixK). Four proteins 
for which mRHA mcasurcmcou were not ivaiUbk - three for wUch 
no clones were detected* tad one intcnkxwlly deleted Trom Ihe RTI 
fkuxt (COX-HJ ~ shovn boxed at Ihe lower left, with correct 
relative protein Abundances. The Pearson product moment correlation 
cwemcicpl between the two sets of 1? valid measurements Is 0,4*. 
Each measurement is labeled with * code whose identity is shown In 
Table I. 



Messtge 
Abundance , 
(% of clones) 



0 »VCf 



.Mi 



• AT* 



• yen 



• tixrr 



Abundance In Plasma ^flOOrf) 

flfciw 2. A lot-lot pkK of dau on mRKA abundance taken from 
Kawamoto et mi Itfl »rmri average P«*eln abundances In pUsma 
taken from P I J- The protein abundance value for Ihe haptoglobin als 
polypeptide has been corrected lo reflect the Tad that this aubMwl 
accounts for only 21 * of the mass of the haptoglobin Ojfc Utraraer. 

centage of total cDNA clones in the transcript images of 
two livers) indicates a modest correlation between the 
two (Fig. 1). The Pearson product-rooroent correlation 
coefficient obtained from the 19 pairs of measurements 
fa 0.48. The abundance values obtained at the protein 
level spanned a 70-fold range, while the detectable 
raRNA abundances spanned a 16-fold range Tor these 
genes (although the latter value may reflect the limited 
number of clones sequenced). One particularly inter- 
esting subset of measurements concerns the p and v 
actios. Here the mRNA abundances are, respectively, 
0.189% and 0.215%, whereas the protein abundances are, 
respectively, 1.41% and 0.65% of the total. In this compa- 
rison, both sets of measurements are likely to be quite 
accurate, since numerous clones were delected for each 
of the two messages, and since the two proteins are so 
homologous, and have such close pfc, that they should 
bind CBB similarly. Nevertheless, the relative abun- 
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fizurr X Relative abundance distributions of the top-ranked 100 
mRNAs and proteins detected In human liver. The first (lcflmosi) 
molecule is the most abundant, followed by molecules of decreasing 
abundance through the 100th rank (at the right). Abundances of both 
mRNAs and proteins are plotted as a percentage of total detected mol- 
ecules on a log scale. Message and protein points at the same rank are 
not, in general, products of the same gene. 

dances at the RNA and protein levels are inverted (3 
actin is the more abundant protein, while y actin has the 
more abundant message), and the roRNArprolein ratios 
for the two genes differ by more than a factor of two. 
Carbamyi phosphate synthase (CPS), the most abundant 
protein detected in liver over the pi range of .conven- 
tional 2-D geb (pH 4-7), had a relative abundance of 
2.83% (protein) and yet comprised only 0.139% of the 
total message (less than either actin). In this case, the 
mature protein is sequestered inside the mitochondrion, 
and therefore might be expected to show slow turnover 
and a consequent large disparity between mRNA and 
protein abundance. 

A reexamination (Fig. 2) of the data of [12] on genes for 
plasma proteins, using estimates for corresponding pro- 
tein abundances revised to account for the ujh structure 
of haptoglobin, showed a higher correlation coefficient 
between mRNA and protein abundance (056). This 
value is probably exaggerated due to the large separation 
of the albumin values from the rest of the data: if 
albumin is omitted from the calculation, the correlation 
coefficient drops to —0.19- However, it is dear that the 
plasma proteins arc represented by many more mRNA 
copies than major cellular proteins: albumin, for 
example, accounts for about 14% of the total number of 
clones examined {12], with a number of other plasma 
proteins accounting for more than 1% of the total each. 
By contrast, none of the cellular proteins chosen from 
the 2-D gel data accounted for much more than 0.1 % of 
the mRNAs sequenced. To further pursue this observa- 
tion, we compared the relative abundance distributions 
or the 100 top-ranked (most abundant) mRNAs and pro- 
teins in our data sets (Fig. 3). Forty-one of the top 100 
mRNAs, and 29 of the top 50, coded for proteins known, 
or expected, from sequence lo be secreted from the liver, 
while none of the top 100 proteins appeared to be secre- 
tory forms of the human plasma proteins. The two most 
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abundant proteins in these samples (hemoglobin P and 
albumin) as well as two of lower abundance (a, antipro- 
tease and transferrin) were blood proteins that consti- 
tute contaminants of the liver in this context-proteins 
which would have been removed by perfusion. 



4 Discussion 

Despite extensive work on the regulation of many indk 
vidual genes, little attention appears to have been paid 
to the global question of the relation between mRNA 
and corresponding protein abundance in eukaryotes. We 
have attempted to provide an initial estimate of the rela- 
tionship of mRNA and corresponding cellular protein 
abundances through use of correspondences between 
two databases: the Molecular Anatomy" (2-D gel) and 
LlfeSeq"' (Transcript Image) databases of human liver. 
Using a panel of 23 proteins Identified on 2-D gels of 
human liver, we searched LtfeSeq*" to determine the 
number of clones matching the corresponding gene 
sequence by BLAST. Matches were found Jbr 19 pro- 
teins, and the correlation coefficient obtained over this 
set of data was 0.48. This number is intriguingly dose to 
the middle position between a perfect correlation (1.0) 
and no correlation whatever (0.0). One simple interpreta- 
tion of such a value is that the two major phases of gene 
expression regulation (transcription through message 
degradation on the one hand, and translation through 
protein degradation on the other) are of approximately 
equal importance in determining the net output of func- 
tional gene product (protein). Several issues, may limit 
the quantitative accuracy of this result First, the protein 
measurements rely on CBB binding to a series of dif- 
ferent proteins. Although the measurements obtained 
show good (low) standard deviations across a set of six 
individual livers, it is well known that different proteins 
can bind CBB with different affinities. Thus the measure- 
ment scale for one protein may differ from another by 
up to approximately twofold. Since, however, these rela- 
tive scale errors should be normally distributed, we 
expect them to have little effect on the overall correla- 
tion. Precision of the mRNA measurements is also 
limited, in this case because a limited number of clones 
was delected for the selected proteins. Five genes, for 
example, were represented by only one clone each 
among the 7925 clones sequenced from the respective 
cDNA tissue libraries. This low relative expression at the 
mRNA level is expected, since a majority of the high 
abundance roRNAs in liver code for plasma proteins. 
However, such small numbers of clones lead to poten- 
tiallylarge quantitative errors because of sampling error. 
Here again, we believe these errors should be relatively 
random across the set of proteins chosen, and thus 
should not skew the result appreciably. A third potential 
difficulty is that the databases used for the protein and 
mRNA abundance estimates were prepared from dif- 
ferent samples. In future, it will thus be of great interest 
to repeat the experiment using the same samples to 
examine both mRNA and protein abundances. 

Despite these potential sources of error, at least one 
homologous pair of proteins (the p and t actins) shows 
persuasive evidence of post-transcriplional regulation, 



with mRNA-to-prolein ratios differing by more than a 
factor of two between the two genes. This is a particu- 
larly striking case since the two proteins are essentially 
indistinguishable in function (apart from affinitiy f or 
MgADP; 22), have very similar sequences, and arc pro- 
duced in a constant ratio (approximately 2:1 in males) i a 
virtually all cell types. One possible alternative explana- 
tion could be a sex difference in liver expression of y 
actin, as is seen in rodents [23] where y actin protein 
expression averages almost twice as high in females as 
males. This seems unlikely since 64% of clones in the 
RTI data were from mile liver, and all the 2-D data was 
from male livers. 

An analogous set of data for plasma proteins secreted by 
the liver has been published by Kawamoto tt al [12] and 
we have reanalyzed their yilues to see whether a similar 
raRNA-to-protein relationship holds. It appears, based 
on nine plasma proteins, that a higher correlation coeffi- 
cient applies: 0.96. This result is less convincing, how- 
ever, because one gene product (albumin) is well-sep- 
arated from the cluster of the remaining eight, and thus 
exercises a disproportionate influence on the correlation 
coefficient. In fact, if albumin is omitted from the calcu- 
lation, the correlation coefficient is reduced to -0.19. 
which suggests a very poor correlation. 

What is perhaps more striking is the relatively much 
higher abundance of the plasma protein mRNAs as 
compared to major cellular proteins such as carbamyl 
phosphate synthase, the actins, or cytochrome b5. Mid- 
abundance plasma proteins were represented by mRNAs 
having approximately 100-fold higher relative abundance 
than mid-abundance cellular proteins. This result is veri- 
fied by a direct comparison of the relative abundance 
distributions of the 100 top-ranked mRNAs and proteins 
in our data sets (which are, in general, different sets of 
genes). Twenty-nine of the top 50 messages are secreted 
products, while none of the top 50 proteins appear to be 
the pro- form of a secreted molecule. Such a conclusi n 
is not surprising, since the liver is responsible for gene- 
rating high protein concentrations in the relatively large 
plasma compartment of the body, but does so by means 
of closely coupled synthesis and secretion with little 
accumulation of precursor proteins in process. This 
points to a potentially significant difference in the pic- 
tures obtained from mRNA and protein abundance data- 
bases. Major secreted proteins appear to have -much 
more abundant mRNAs than many important cellular 
proteins, and hence mRNA abundance databases that 
concentrate on a small number of the highest abun- 
dance messages may be biased towards secreted pr leins 
over cellular molecules. This represents in advantage of 
the mRNA approach relative to protein databases in the 
search for novel cytokines and other secreted proteins, 
but a disadvantage in the characterization of cellular 
metabolic and control processes. AddiUonally, it suggests 
that mRNAs for secreted proteins may have, on the 
whole, shorter half-lives than mRNAs for cellular 
enzymes, the latter being more frequently regulated at 
. the translationat level. 

We also found important differences in the overall 
shapes of the relative abundance distributions of the iuu 
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top-ranked mRNAs and proteins. While both distribu- 
tions contain a few very high abundance molecules (in 
the 3-10% range) Ihey appear to diverge significantly 
below the 15ih most abundant gene product, with pro- 
teins 16-100 accounting for roughly twice as high a rela- 
tive abundance as the 16th-100th mRNAs, Not all pro- 
teins arc represented on the 2-D gels used here (which 
faU to resolve proteins with pi >7), but the estimated 
40% of proteins thus excluded would not affect the 
shape or the distribution over positions 50-100 signifi- 
cantly if Ihey have an abundance distribution similar to 
the pi 4-7 proteins (based on a simulation using the 
data shown). The mRNA abundance distribution covers 
all cloned messages (not a subset of genes), and for 
abundant mRNAs it should be complete as it stands. 
Altogether, the top 100 mRNAs comprise 513% of the 
total clones, while the top 100 proteins comprise 63.r% 
of the total protein detected. Hence it appears likely that 
the distribution or protein abundances is significantly dif- 
ferent from that or mRNAs, showing a more gradual fall- 
ofT in the region examined, and that techniques able to 
detect down to a specified percent abundance threshold 
would reveal more proteins at a given threshold than 
mRNAs. As the protein and nucleic acid databases 
expand, we anticipate the possibility of generating suc- 
cessively more robust estimates of the global relation- 
ship between mRNA and protein abundance, and thus a 
better understanding of multi-level gene expression con- 
trol in complex organisms such as man. 

Human liver samples analyzed by 2~D electrophoresis were 
kindly provided by the National Biomonitorinz Specimen 
Bank at the US National Institute of Standards and Tech- 
nology under the . direction of Dr. Stephen Wise* 
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A database of protein expression in lung cancer 

We have developed a comprehensive approach to identifying molecular changes in 
lung cancer that includes both genomic and proteomic analyses. The related effort 
has produced a large amount of data pertaining to gene expression at the RNA and 
protein levels. As a result, we have constructed a database that contains protein 
expression data on lung cancer as well as other relevant data including DMA micro- 
array derived data. A large number of proteins that are expressed in different types 
of lung cancer have been identified and have been correlated with the expression 
measures for their corresponding genes at the RNA level. The database is intended to 
facilitate our effort at developing novel classification schemes for lung cancer and the 
identification of novel markers for early diagnosis. 



Keywords: Lung / Cancer / Database / Microarray 
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1 Introduction 

There is substantial interest in implementing novel and 
comprehensive strategies for the molecular analysis of. 
tumors and relevant biological fluids. We have implement- 
ed a strategy for the molecular analysis of lung cancer 
that integrates genomic analysis using genome scanning 
procedures, transcriptomic analysis using cONA and 
oligonucleotide microarrays, and proteomic analysis. For 
the latter, we have relied to date primarily on 2-D poly- 
acrylamide gels. However the 2-D gel approach Is being 
increasingly complemented with additional analyses 
using liquid based protein separations and protein micro- 
arrays. While on the one hand proteomic analysis com- 
plements genomic analysis for a global assessment of 
gene expression, on the other hand proteomic analysis 
uniquely contributes an understanding of protein post- 
translational modifications and the location of protein 
gene products in subcellular compartments. The scope 
of our overall molecular analysis study of lung cancer is 
shown in Rg. 1 . Important objectives include the develop- 
ment of novel molecular classification schemes for lung 
cancer and the identification of novel markers for the early 
detection of lung cancer. 

The large body of proteomic and other data we have col- 
lected has necessitated the construction of a database in 
which basic and derived data is organized. There have 
been relevant related efforts at databasing of 2-D data 
by other groups. One such database is the 2DWG Meta- 
database of 2-D gel images, which contains 2-D derived 

Correspondence: Dr. S. Hanash, University of Michigan Medical 
Center 1150 W. Medical Center Drive, A520 Medical Science 
Research Building I, Ann Arbor Ml 48109-0656, USA 
E-mail: slraiash@umich.edu 
Fax: +1-734-647-8148 

Abbreviation: LTPS. Laboratory information processing system 



data acquired by a combination of review of results as 
well as submissions by investigators [1]. However, to 
date there are only three entries found matching the query 
for human lung images in the 2DWG Web Gel Meta-data- 
base web site (httpyAvww-lecb.ncifcrf.gov/2dwgDB). 
The database we have constructed, in its entirety, is rele- 
vant to a variety of cancers. However the focus of this 
review is the use of the database to achieve our objec- 
tives related to the molecular analysis of lung cancer 
specifically. The goal of the database is to facilitate 
planned analyses, I.e. statistical analysis, as well as 
post-planned analyses, Le. data mining. The intent is to 
make the database queryable on a protein - by - protein 
basis as well as through subgrouping of samples ana- 
lyzed, in a menu driven fashion. Internet and WWW tech- 
nologies are used not only to allow investigators to view 
visual and textual data together, but also to allow investi- 
gators in other locations to retrieve archival data using 
different computer systems. 



2 Laboratory information processing 
system 

A long-standing Laboratory information processing sys- 
tem (UPS) developed by our group [2] has been adapted 
for our database. LIPS consists of multiple systems and 
processes. A variety of data is stored in a variety of formats 
with incfividuafized programs for viewing the data. Typical 
processes using UPS Include: sample inventory; digitize 
images; detect and quantify spots; match spots and nor- 
malize spot sizes across images, choose spots for MS 
analysis, enter profiles from MS-Fit web search; transfer 
data to statistical software or spreadsheets. 

Data tend to be complex and dynamic in that their con- 
tents are ever changing as information is added, modified 
or removed. Simple or intensive analyses of 2-D patterns 
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have produced a large amount of data. Data is both tex- 
tual (e.g., reports and numbers) and visual (e.g., 1-D and 
2-0 gel Images). 

Some types of data generated by LIPS include: 2-D pro- 
tein gel images (silver, modified silver, blots, *S labeled 
gels); genome scans; 1-D gel images; spot information- 
protein names; gene information from DNA microarrays; 
MS files and MS-Frt reports (Word documents); figures 
(Raster files on the Sun and actual photographs); data 
from protein microarrays; data from liquid chromato- 
graphy separations. 

However, as computer technology has evolved, quantum 
Jumps In improvements in organizing unstructured, scien- 
tific data into a structured database have become possible. 
A major function of our database and its interfaces is to 
serve as a computer-based tool for capturing the basic 
quantitative data from 2-D gel images and derived data 
and findings derived from different studies about proteins 
detected in 2-D patterns of various tumor types [3J. As a 
result, investigators are provided with easy access to data 
as well as a means for intelligent data mining of the existing 
data. A logical view of the database schema is shown in 
Fig. 2 and a list of tables and their attributes are shown in 
Table 1. 

The following are important features of the 2-D gel related 
component of our lung protein database: 

(1) All 2-D gel images are placed in hierarchies so that: (a) 
every study image is matched to one master image, le. all 
lung adenocarcinoma tumor images are matched to one 
master image; (b) every master image is matched to at 
most one (higher) master image, i.e. all masters for differ- 
ent lung tumor types are matched to one tumor master. 



This allows the database to have an indexing mechanism 
that can relate a spot to any gel in the hierarchy. The data- 
base provides a capability to access the basic and 
derived data using the following types of queries: (a) given 
a spot on any gel, find all spots that are matched to it; (b) 
given a spot on any gel, find all protein identifications 
made for it, and (c) given a spot on any gel, find an find- 
ings/conclusions that are finked to it 

(2) All samples (and thereby gels derived from them) are 
identified by a list of source characteristics in four major 
categories: experiment code; cell type code; treatment 
code; and fraction code. This allows the database to 
have an identification mechanism that can relate a gel to 
any source in the hierarchy. The database provides a cap- 
ability to find all images as follows: (a) given a category, 
find ad images that have the same value of the category, 
and (b) given any combination of four categories, find all 
images that satisfy the condition. 

(3) All protein spots are identified by a list of characteris- 
tics in four major attributes: protein name; p/ and M r ; 
accession number, and protein sequence data. A spot 
may have several findings and there may be many kinds 
of findings derived from a particular study. If possible the 
findings are recorded in a consistent way, however this is 
not always possible due to some characteristics of such 
findings (eg., statistical analysis matrices, MS data, and 
Affymetrix data). As the number of studies has increased, 
the amount of data produced has increased. Some of the 
data (e.g. mass spectra and Affymetrix (Santa Clara, CA, 
USA) oligonucleotide chip readouts) is very large, and fills 
up the hard disks of the computers where it is collected. 
Such data is generally saved on CO-Rs, and only the most 
recent data is kept in a computer. It is sometimes easier 
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Table 1 . A list of tables and their attributes in the lung protein database 



Table name 


Unique identifier 




(Primary Key) 


Project 


Project Name 


Sub Project 


Sub Project Name 


Subject 


Subject ID 


Tissue Sample 


Tissue Sample ID 


DNA Sample 


DNA Sample ID 


Gel 


Gel ID 


Image 


Image Name 


Spot 


Image Name & Spot No 


Match 


Match ID 


Experiment 


Experiment Code 


Cell Type 


Cell Code 


Treatment 


Treatment Code 


Fraction 


Fraction Code 


Protein Sample 


Sample ID 


Protein 


Protein Name 


Other Link 


Protein Link ID 


Findings 


Image Name & Spot No 


Protein Identification 


Image Name & Spot No 



List of attribute types 



Project Type, Description 

Date Started, Comment 

Case No, Sex, Birthdate, Comment 

Tissue Type, Diagnosis, Date Sample Taken, Date Received, How 

Received, Source, Comment 
Date Produced, Concentration, Freezer Location, Comment 
Sample ID, Batch ID, Enzyme Combination, Bectrophoresis 

Process, Comment 
Date Imaged, Exposure Time, Image Type, Image Location, 

Comment 
X, Y, Intensity, Spot Type 

Master Image Name, Master Spot No, Image Name, Spot No 

Description 

Description 

Description 

Description 

Experiment Code, Cell Code, Treatment Code, Treatment Date, 
Fraction Code, Comment, Project Type, Gel ID, Image Name, 
Image Type, Researcher 

image Name, Spot No 

Protein Name, Database Name, URL 

Category, Designation, Finding 

Accession No, cDNA cloning, Ceil Lines, Facility, Date, 
Genomic Cloning, Glycosylation, M n p/, FtK>sprxxylation ( 
Phosphorylation Residues, Related Spot, Sequences, 
Source of Protein, Name, Structural Modification, Subcellular 
Localization, Tissue Distribution, Type of Membrane, 
Type of Sequencing 
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to post individual files on the web. Individual web pages 
have been created with textual and visual data that are 
difficult to relate in a table. This allows investigators an 
opportunity to analyze 2-D gel and other images contain- 
ing spots that have not been detected or identified and to 
compare data across studies. In addition this is used to 
link our data to other biological knowledge repositories 
such as GenBank, PIR International, and SWISS-PROT. 



3 Contents of the lung cancer protein 
database 

A large number of studies involving lung cancer have been 
independently performed in the laboratory. At the protein 
level, these studies have resulted in 1349 images, over 
1 000 of which are images of 2-D gels for which information 
has been recorded in the lung protein database. This num- 
ber represents a fraction of the 30 682 2-D gels produced by 
our group for different studies, which include studies of other 
cancer types encompassing head and neck, esophagus, 
Bver, colon, pancreas, ovary, breast, prostate, brain, leu- 
kemias and childhood tumors. A list of protein gel images 
related to lung studies is shown in Table 2. While lung adeno- 
carcinomas represent a major portion of the database, 
otf>erlungtunK)rtypesin<^^ squamous eel! carcinomas 
and small cell lung cancers are represented, as are control 
lung tissues. Other 2-D patterns were produced from 
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Table 2. A high-level categorization of lung protein 2-D 
images by sample type 



Lung Sample Types 



Cell Lines 


421 


Cystic Fibrosis 


44 


Tumor 


B35 


Normal 


170 


Plasma 


61 


Other 


18 


Total 


1349 



studies of cell lines that have been manipulated by trans- 
fection or by treatment with specific agents, as well 
as patterns produced after different cell fractionation 
schemes. Substantial emphasis is currently being placed 
on the comprehensive profiling of lung cancer derived 
surface membrane proteins. 

Mass spectrometry and/or N-terminal sequencing of pro- 
tein spots from 2-D gels of lung tumor samples or cell 
lines have led to the identification of a large number of 
proteins expressed in lung cancer. Also, most identifi- 
cations made for proteins from a sample type can often 
be confidently transferred to matching protein spots on 
master images from lung studies. Table 3 and Rg. 3 ex- 
hibit some of the progress we have made in identifying 
proteins in 2-D gels of lung samples, t 




Figure 3. Small cell lung tumor 
master containing identified pro- 
teins. 
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Spot # 


NCBI 

Accession 

Number 


GenBank 
Number 


p/ 




Official 
gene 














577 * 


398953 


P31947 


4.361 


30.052 


SFN 


615 


112695 


P29312 


4.569 


29.101 


YWHAZ 


1279 


437363 


AAA35483 






YWHAH 


24 


2507178 


P16118 






PFKFB1 


928 


4502201 


NP_001649 


6.31 


20.697 


ARF1 


319 










ALB 


800 










ALB 








5.957 


70244 


ALB 


207 


4502031 


NP_000680 


6.811 


56566 


ALDH1A1 


543 


3493209 


AAC36469 


7.812 


32379 


AKR1B10 


1A 
i** 


130737 


P05187 


5.86 


57.954 


ALPP 


666 










ALB 


693 










ALB 


332 


4503571 


NP_001419 


7.742 


45.407 


ENOI 


268 


8272482 


AAF24221 






HCR 


268 


5360901 


BAA82158 








172 


5174477 


NP_006073 


5.099 


52.848 




802 






4.796 


17.194 




460 


113944 


P04083 


6.73 




ANXA1 


522 


4502107 


NP_001145 


4.83 


33.326 


ANXA5 


685 






5.124 


25.4 




1278 


71967 


LNHUPS 








795 






6.33 


18.909 


ARF1 


96 


227920 


1713410A 


5.34 


14.584 


LGALS1 


349 


113270 


P02570 


529 


41.7 


ACTB 


61 


29497 


X59511 






SPTB 


229 


4507729 


. NPJX)1060 


4.75 


49.8 


T! IDD 

TvJBd 


22 


11995077 


AB038211 








104 


4757900 


NP_004335 


3.668 






469 






3.442 


48.772 




36 


4929561 


AAD34041 


6.25 


49596 




f H57 


4502643 


NPJX)1753 


7.034 


60547 


CCT6A 


1338 


4502899 


NP_001824 






CLTA 


789 










COL15A1 


85 


1362772 


E57233 






CPLX2 


856 






5.415 


11.858 


CRABP2 


855 


4506451 


NP_002890 


4.667 


10597 


RBP1 


439 


180570 


AAC31758 


5.34 


42.618 


CKB 


872 






4.568 


9.2 


KRT8 


321 


1673575 


U76549 







ID 

Source 



Name 



L95 



LM 
LM 

L95 

DMS79 



LM 
LM 
LM 

SKMES 



LM 

L95 
L95 

LM 

LM 

A549 

LM 

LM 

L95 

LM 

SKMES 
LM 

DMS79 
LM 

DMS79 

LM 
LM 

A549 
A549 
L95 

DMS79 
LM 

LM 



LM 

A549 



(spot1496L) possibly 
pacreatitis-associated 
protein 

14_3__3_sigma 

14_3_3_ZetaDetta 

14-3-3n 

6PF-2-K/FRU-2.6-P2ASE 

Liver isozymer 
ADP-ribosylation factor 1 
Albumin 
Albumin 
Albumin 

Aldehyde Dehydrogenase 
AldoKeto Reductase 
Alkaline Phosphatase, 

Placental type 1 precursor 
Albumin 
Albumin 
a-Enoiase 
a-heiical protein 
a-heiix coiled-coil rod 
homolugue 
aTublin 
Amyloid B4A 
Annexinl 
Annexin V 
ApoAl 

Apoprotein, pulmonary 

surfactant 
ARF1 

p-Galactoside soluble lectin 

0-Actin 

p-spectrin 

p Tubulin 

Calmodulin dependant 
phosphodiesterase 

Calreticulin 

Calreticulin32 

CGM6 protein 

Chaperonin-fike protein 

Clathrin light chain A 

Collagen, type XV, a 1 

Complexin II 

Cellular retinoic acid- 
binding protein 2 

Cellular retinol-binding 
protein 1,CRPB1 

Creatine kinase, brain 

Cytochrome C bxydase VA 

Cytokeratin 8 
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Table 3- Continued 



ID 

Source 



Name 



Spot# 


NCBI 

Accession 
Number 


GenBank 
Number 


p/ 




Official 
gene 


446 


2506774 


P05787 


5.52 


53.674 


KRT8 


439 


2506774 


P05787 


5.52 


53.674 




289 


4504915 


, NP.002266 


4.153 


49.261 


KRT15 


759 


118674 


P09622 








811 


6005749 


NP_009193 


6.44 


21.015 


UJ1 


700 






6.263 


24.001 




57 


6969163 


CAB75301 








769 






5.719 


20.136 




1445 


4885417 


AB022435 






HIP2 


718 






5.104 


22.961 




839 






4.599 


10.957 




902 










ERH. 


295 


119347 


P09104 


4.94 


47.286 


EN02 


18 






4.945 


78.717 




1519 


5453559 


NP_0063475 


5.21 


18.491 


ATP5JD 


31 


3041657 


P24864 






CCNE1 


540 






7.457 


31.772 




348 


113278 


P02571 


5.146 


42.315 


ACTG1 


650 


417246 


Q04760 


4.833 


25.572 


GL01 


86 


117561 


P04141 






CSF2 


87 






5.9341 


73.124 




79 






5.187 


68.109 




690 


726098 


AAC13869 


5.5 


25.4 


GSTP1 


626 


123571 


P04792 


7.83 


22.327 


HSPB1 


631 


123571 


P04792 


7.83 


22.327 


HSPB1 


457 


5031753 


NP.0O5511 






HNRPHl 


818 


511776 


U11269 


5.55 


oe ceo 
00.000 




120 






o.oyo 






46 








7fi flOfi 




1036 


6841118 


AAF28912 








1547 


6841292 


AAF28999 








1548 


6841292 


AAF28999 








181 


4504521 


NP_002147 


5.7 


61 


norUl 


1595 


1708113 


P54255 






HAP1 


1548 


1708113 


P54255 






HAP1 


183 


6225015 


Q16352 


5.48 


54.908 


INA 


934 


4557701 


NPJKM13 


4.97 


48.106 


KRT17 


26 


10047295 


AB046830 








340 






4.549 


44.03 





A549 
A549 
LM 
A549 



LM 
LM 

DMS 79 

LM 

L95 

LM 
LM 



LM 

A549 

DMS 79 
LM 
LM 
LM 

FMD79 



LM 
LM 
LM 



A549 

A549 

LM 

LM 

L95 

L95 

L95 

A549 

L95 

L95 



DMS 79 
LM 



Cytokeratin 8 

Cytokeratin 8 

Cytokeratin 15, keratin 15 

Dihydrolipoamide dehydro- 
genase, mitochondrial 
precursor 

DJ1 

DJ1.MER5 
dj475N16.1(CTG4A) 
DUTPhase 

E2 ubiqurtin-conjugatinig 

enzyme 
EIF4d 
EIF5A 

Enhancer of rudimentary 

(Drosophila) homolog 
Enolase 2 (y, neuronal) 
ENPL_HSP100 
F1 FO-type ATP synthase 

subunftd 
G1/S specific cyclin E1 
G3PD 
y-Actin 
Gryoxalasel 

Granulocyte-rracrophage 
colony-stimulating factor 
precursor 

GRP75 

GRP78 

GSTpi 

Heat shock 27 kD protein 1 
Heat shock 27 kD protein 1 
Heterogeneous nuclear 

ribonucleoprotein H 
HLA-B71 orHLB-B71 

variant 
HSC70.HSP73 
HSP90 
HSPC089 
HSPC321 
HSPC321 
HuCha60SP60 
Huntingtin associated protein 
Huntingtin associated protein 
Intemexin neuronal Intermediate 

filament protein, alpha 
Keratin 17 
KIAA1 610 protein 
LamR 
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ID 

Source 



Name 



Spot# NCBI 

Accession 
Number 



GenBank 
Number 



Pi 



M r 



Official 
gene 



LM 
A549 

A549 

LM 



L95 

DMS79 



DMS79 



LM 

A549 



LM 

LM 
LM 
LM 
LM 
LM 
L95 
L95 
L95 



L95 
A549 

A549 



A427 



L95 



L95 
L95 



873 



460 
906 

906 

924 
924 

1338 
33 



74 



Lectin, galactoside-binding, 

soluble, 1 (galectin 1) 
Lipocortin 

L-Lactate Dehydrogenase 

H chain 
L-lactate dehydrogenase 

H chain (LDH-B) 
LaminB 

Lymphocyte cytosolic 

protein 1 (L-plastin) 
Macropain subunit zeta 
MHC class 1 

histocornpatability 

antigen protein 
Mutticalaytic endopeptidase 

com pies chain C2, 

long splice from 
MyosinLightCahin3 
Nm23, NDPKA 
Non metastatic ceils 1, 

protein (NM23A) 
Op 18, leukemia-associated 

phosphoprotein p18 (stahmin) 
Op 18a 
Op 18m 

Phosphoglycerate MutB 
Phospholipase C 
PIMT 

Pinch-2 protein 
Pinch-2 protein 
PossiWy activin type II 

receptor precursor; 

DNA potymerase epsiton 

subunit B; or ITF-1 DNA 

binding protein 
PossiWy BTF2p44 
Possibly carbonci anhydrase III 1 242 

orUCH-L1;PGP9.5 
PossiWy 5-3,5 8-2,4- 

Dienoty-CoA isomerase 

precusor 
PpssiWyGI to S phase 

transition protein; serine- 

theonine phosphatease 

protein; or phosphatase 

5 protein 
PossiWy GCF2 fusion 

protein or Bamacan 

homolog 
Possibly glycosyttransferase 
PossiWy HLADQ 



1496 



2138 



321 



320 



1519 
1271 



227920 

113944 
126041 

4557032 



1713410A 

PO4083 
P07195 

NP_002291 



4504965 NP_002289 



5.34 
6.73 



6.787 
6.20 



4506187 
1236790 



346314 



NP_002289 
U06487 



JC1445 



6.51 



14.584 LGALS1 

39.264 ANXA1 
LDHB 

LDMB 

69.625 

70.290 LCP1 

PSMAS 



30.239 



815 






4.11 


15.172 




1456 


127981 


P15531 


5.809 


19.216 




793 


4557797 


NP_000260 


5.83 


17.148 


NME1 


809 


5031851 


NP_005554 


5.783 


17.164 


LAP 18 


807 


5031851 


NP_005554 


4.962 


13.655 


LAP 18 


808 


5031851 


NP_005554 


5.302 


14.857 


LAP 18 


639 






7.083 


27.227 




248 






5.7 


56.5 




662 






6.211 


25.804 




1695 


9800509 


AAF99328 








1825 


9800509 


AAF99328 








627 
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ID 


Name 


Spot# 


NCBI 


GenBank 




M, 


Official 


Source 






Accession 


Number 






gene 






Number 










A549 


Possibly hydroxyacylglutathione 
hydrolase or B-lymphocyte 
Mniigen uulu 


1080 












Lyo 


rOSSiDiy rniwuLuiAiit* lki&cvj 
motor protein 


1438 












L95 


Possibly putative novel 
protein similar to HPS 














L95 


Possibly Spi-B; unnamed 

rtt-rttAin rtrrwii ir4 f AKOOI RAA\' 
pfO Id 11 prOUUCl ^r\J\W I 0*r**) t 

or protein kinase (fl 5801) 


1187 












L95 


Possiuiy I -complex proiein 


DOW 












A549 


Possibly U 1 small nuclear 
ribonuclear protein A 


1148 












L95 


Possibly unnamed protein 
product (AK000369) or 
syntaxin 


1064 












L95 


Possibly unnamed protein 
product or Pro0282p 


1351 














protein 










57.116 


P4HB 




procollagen-proline, 


110 


2507460 


P07237 


4.76 




2-oxoglutarate 4-dioxgenase 
















(proline 4-hydroxylase), beta 
















polypeptide (protein disulfide 
















isomerase; thyroid homone 
















txndung protein p55) 












PCNA 




nrnl'rf /-vr-ofti rv/'-l /tall m f 

proinerauny ceil nuciear 


515 


129697 


P17070 


4.4 


37.5 * 




antigen 












PPP2R1B 




Protein phosphatase 2 


104 


5915686 


P30154 


4.84 


66.202 




(formerly 2A), regulatory 
















SUDUnrt A trn DO;, p-lSOiorm 














1 Hi 


Proton H rvpfi irw 


40 






3.714 


62.182 


HINT 


LM 


Protein kinase C inhibitor i 


GOO 




NP 005331 


7.714 


11.521 


L95 


Pulmonary surfactant 


iLrO 


190565 


AAA36510 






SFTPA1 
















SFTPA1 


L95 


Pulmonary surfactant- 




131412 


P07714 






LM 


D1Q700 ^ 


848 


3355455 


AAC27824 


7.508 


13.163 






neiinoi-D*nuiny pioiewi i » 


855 


4506451 


NP.002890 


4.99 


15.850 


RBP1 




ceiiuiar 














LM 


RoSS_A_Antigen 


69 






3.215 


47.903 


S100A11 




S1 00 calcium-binding 


906 












protein A11 (calgizzarin) 












O 1 VAIrVO 




S1 00 calcium-binding 


910 


115442 




D.0 I 






protein A8 (calgranulin A) 










13.291 


S100A9 




S100 calcium-binding 


931 


6094219 


P50117 


6.37 




protein A9 (calgranulin B) 










66.202 


PPP2R1B 


DMS79 


Serine/threonine protein 
phosphatase 2A, 65kDa 
regulatory SubunitA, 
p isoform 


14 


5915686 


P30154 


4.84 




SET translocation (myeloid 


376 


1711383 


Q01105 


4.12 


32.103 


SET 




leukemia-associated) 
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ID 


Name 


Spot* 


NCBI 


Gen Bank 


P' 


H 


Official 


Source 






Accession 


Number 






gene 






Number 












Small glutamine-rich 


476 


8134666 


043765 


4.81 


34.063 / 


SGT 




leirancopepuae icpt?<u 
















/TPR V-r^**nt ainina 

llr rl^Wi nan m «y 










27.774 


SFN 




Stratifin 


577 


398953 


P31947 


4.68 


L M 


OUpCf OAlOtJOIirf 1 1 vlliJ 1 


792 


134611 


P00441 


5.6 


17.3 


SOD1 


L M 


ouperoxiQe lyisrniviiTi, 
oi irwoxide dismutase 2. 
mitochondrial 


737 


134665 


PCM 179 


7.887 


20.78 


SOD2 


LM 


TCP 1 P subunH 


202 






5.89 


59.841 




LM 


TCTP (trartslationalty-controlled 
tumor protein 1 ) 


680 


4507669 


NP_003286 


4.688 


25.143 


TPT1 


1 M 


Thioredoxin 


896 






4.689 


9.207 




1 M 


Tplastin HSP 70 


125 






5.862 


68.909 




LM 


1 ranSliijl Clin 


842 






5.693 


14.714 


TPI1 




Trinfieohosohate isom erase 


672 


136060 


P00938 


7.2 


25.5 




Tropomyosin, cytoskeletal 


550 


136096 


P12324 


4.5 


31.9 






type, tropomyosins 












TPM4 


LM 


Troporrryosin 4 


548 


13274400 


AAK17926 


4.377 


32.733 




Troponin T 


866 


408217 


AAB27731 








L95 


Troponin T 


778 


408217 


AAB27731 








Tublin, B polypeptide 


229 




MP 001060 

111 \J\J 1 wv 


4.78 


49.907 


TUBB 


DMS 79 


Tumor associated hydroquinone 34 


6644167 


AF207881 










(NADH) oxidase tNOS 
















tyrosine 3-monooxygenase/ 


576 


I I DO l y O 




4.63 


29.174 


YVVHAE 




tryptophan 5-rnonooxyge- 










< 






nase activation protein, 
















epsikxi polypeptide 










26.645 


YWHA2 




tyrosine 3-fnonooxygenase/ 


615 


112695 


P29312 


4.73 




tryptophan 5-monooxy- 
















genase activation protein. 
















. zeta polypeptide 
















Tyrosine 3-nxxiooxygenase/ 


579 


112690 


P27348 


4.68 


27.764 


YWHAQ 




tryptophan 5-monooxy- 
















genase activation protein, 
















theta polypeptide 










27.745 


UCHL1 


A549 


Ubiquitin carboxyl-terminal 
esterase L1 (ubiquitin 
thiol esterase), UCH-L1 ; 
PGP 95, GSTmu 


656 


lODOO 1 




5.283 


L95 


Unnamed protein product 


1270 


7023092 


BAA91833 




31.263 




AA4Q 


Urokinase plasminogen 
activator 


842 


487123 


. S39495 


6.01 




LM 


V*id1 


293 






4.712 


47.485 




LM 


Vid2 


294 






4.614 


46.369 




LM 


VkJ4 


337 






4.464 


45.322 


VIM 




Vimentin 


294 










A427 


Vimentin 


606 


4507894 


NM.003380 






VIM 


A549 


Vimentin 


505 


418249 


PO8670 






VIM 


A549 


Vimentin 


47 


340234 


M25246 






VIM 
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In addition to 2-D gel analysis, most lung adenocar- 
cinomas are examined at the genomic level using restric- 
tion landmark genome scanning, and by mutation analy- 
sis for a small number of genes. Transcriptomic analysis is 
done primarily using oligonucleotide microarrays, as part 
of our efforts to derive a molecular based classification of 
lung adenocarcinomas that is more predictive of clinical 
behavior for this group of tumors than current classifi- 
cation schemes. We also have similar molecular analyses 
of control lung tissue obtained from multiple sources 
including adjacent lung tissue from lung cancer patients 
as well as tissues obtained from non-cancer resected lung. 

Only a fraction of the information in the 2-D patterns has 
been linked across all studies and analyses. The lung pro- 
tein database contains the basic descriptive data of var- 
ious samples analyzed, the images of the 2-D patterns 
that resulted from these samples, the quantitative spot 
data and information about which spots have been 
matched to each other, and conclusions or findings about 
spots. Trie database is intended to allow not only the 
retrieval of existing data, but also to mine new information 
and knowledge about protein expression in lung cells. 
Data mining activities consist, for example, of reviewing 
previous studies and finding out which 2-D gel patterns 
and protein spots are interesting for post-planned analy- 
sis and new discoveries. Such discoveries derive from: 
(1 ) identification of proteins that exhibit interesting expres- 
sion profiles in 2-D patterns that have been regrouped 
from different experiments and studies; (2) expanded 
statistical analyses that cover protein expression patterns 
involving large numbers of experiments and images; 
(3) relating our data involving proteins to outside informa- 
tion; and (4) relating proteomic data to genomic data. 



4 Use of the database for post planned 
analysis 

4.1 Virtual matching 

Interactive software packages are used to automatically 
detect and quantify spots and to match spots between 
different protein patterns, with visual editing to correct 
any errors in computer based matching. The spot match 
program has created indices that allow investigators to 
quickly navigate through many gels and easily compare 
spots on images from many different experiments and 
studies, discover proteins of interest, and access and 
view relevant data. Here the term "match" is used as a 
logical -transitive" relation, which means If spot A is 
matched to spot B and spot B is matched to spot C 
then the spots A and C are considered matched. The 
lung protein database contains data on proteins detected 
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on various 2-D gels. Since all gels derived from whole 
cell or tissue lysates in the lung protein database are 
tied into a single hierarchy, protein identification data 
recorded for a spot is used to derive protein data for its 
matched spots using an advanced query capability of 
the database. This is known as "virtual matching" or Vir- 
tual protein identification", which allows investigators to 
access and view all matched images and the corres- 
ponding information from the lung protein database. 
With a click on a spot, one gets the result shown in 
Fig. 4. The virtual protein identification feature does not 
provide a 1 00% level of certainty of protein identification, 
but it makes possible the display of spots of interest. A 
combination of automated recognition and manual edit- 
ing generally yields an accurate record of protein infor- 
mation in the database for previously unknown proteins. 
With this approach, the lung protein database will evolve 
and mature to include all correct data for further analysis 
and data mining. 

4.2 Integrating protein spot data with MS data 

As interest in proteomic analysis grows, a number of very 
large public databases are available to access protein 
data via the internet. Public databases offer a sophis- 
ticated text search and keyword search, which links any 
entered keyword to all protein information associated with 
that keyword, to ensure easy access tb all relevant data. 
Protein identification using MALDI-MS relies on database 
searches and usually has three components: (1) peak 
detection which allows automatic determination of pep- 
tide masses; (2) search in protein sequence databases 
(SWISS-PROT and/or GenBank) for protein entries that 
match the masses; and (3) certainty calculation which 
determines the quality of the match for each protein in 
the list [4]. An example of such a software tool is the Pep- 
Frag for searching protein and DNA sequence databases 
that can use different types of mass spectrometric infor- 
mation [5]. Fenyo [6] described methods and software 
tools in proteomics for identifying and characterizing pro- 
teins, which emphasizes MS combined with database 
searching. Proteolytic peptide mapping and genome 
database searching provide an automated means for 
identifying proteins, and the certainty of the results is 
computed by the number of masses matched for each 
protein [7]. Another useful tool is FtndMod (http://www. 
expasy.c^sprot/nndmod.htmO for the systematic charac- 
terisation of proteins using mass spectrometry [8]. 

We have created MS data forms that contain information 
used in mass spectrometry queries, summary information 
(Rank, MOWSE score, % Masses Matched, MW, p/, Spe- 
cies, Accession #, Protein Name) and additional informa- 
tion (Summary ID, Submitted Mass, Matched Mass, Delta 
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Figure 4. Virtual protein identification by clicking a spot 



PPM, Start, End, Peptide Seq. Modifications, Unmatched 
Masses). An example of the MS data form is shown in 
Fig. 5. Integrating the lung protein database with MS 
data provides a record of protein identification and high 
level of integration with other public databases, although 
substantial effort is required for data collection. We are 
currently evaluating an automated or semi-automated 
method of pulling these data when new information, 
which is relevant to our objectives, is available. 



4.3 Integrating protein data with microarray data 

As technology evolves, new computer aids and methods 
are introduced for genomic analysis as well as proteo- 
mic analysis. With respect to DNA microarray platforms, 
a current goal is to construct lung specific cDNA micro- 
arrays for lung cancer investigations. In the meantime 
RNA expression data for lung cancer is being collect- 
ed using an Affymetrix oligonucleotide based system. 
This system automates the identification and quantifica- 
tion of microarray spots. Data files contain integrated 
intensities for each spot and ratios showing fold changes 
per spot. The use of oligonucleotide based microarrays 
for RNA analysis in lung cancer by our group has resulted 



in a massive amount of data. Integration of protein infor- 
mation in the lung protein database with microarray data 
allows us to extend data analysis capability to encompass 
RNA and protein data for a subset of genes. 



5 Some findings derived from the lung 
cancer protein database 

5.1 Unique proteomic pattern of small cell 
lung cancer 

A major goal of our proteomic and genomic studies of 
lung cancer is to derive novel classification schemes that 
have utility in making a diagnosis, predicting outcome and 
in making therapeutic decisions. An important first step in 
this direction is to determine the ability of proteomic pro- 
filing to distinguish between known types of lung cancer. 
Specific protein differences between different types of 
cancer have been identified by other groups. In a recent 
study of breast, ovary and lung tumors, 20 differentially 
expressed proteins were identified [9] and in a prior study, 
1 6 polypeptides were found to be associated with differ- 
ent histopathological features of lung cancer [10, 11]. In a 
study of 25 adenocarcinomas of the lung, 12 small cell 
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Figure 5. MS data form 



hjng cancers, and 1 6 squamous cell tumors, by our group 
{manuscript submitted) an initial analysis of protein 2-D 
patterns uncovered a group of 52 protein spots that dif- 
fered in average integrated intensity between the three 
groups. Performing simple two-sample Mests gave p 
values of less than 0.05 for the 52 spots for at least one 
of the pairs of groups. Most of the spots differed between 
small cell and the remaining two diagnostic groups, with 
47 spots differing significantly between small cell and 
adenocarcinoma groups and 44 between small ceil and 
squamous (p<0.05). Between the adenocarcinoma and 



squamous groups 12 spots with difference of this signifi- 
cance were found. Summary data for some of the spots is 
presented in Table 4. The first two principal components 
of the data are graphed in Figure 6, and show that as a 
group the spots distinguish small cell tumors from the 
other two tumor types fairly easily. 

We have Identified 39 of this set of 52 spots by either 
W-terminal sequencing and/or MS of spot digests. Small 
cell King cancers were characterized by higher average 
amounts for some proteins associated with cell pro! if era- 
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Table 4. 39 identified protein spots found to differ between small cell, adenocarcinoma, and squamous tumors of the lung 
(n = 12, 25, 16). In the f-test columns are p values from the two-sided two-sample t-test companng each pair of 



groups 





1 IninpriP rii^crfintinn 


Offical 
gene 


Mean 
adeno- 


Mean 
squa- 


Mean 
small 


t-test 
adenocar- 


f-test 
small cell 


t-test 
adenocar- 






symbol 


car- 


mous 


cell 


cinoma vs 


vs squa- 


cinoma vs 






cinoma 






small cell 


mous 


squamous 


294 


vimentin 


VIM 


1.36 


1.16 


0.53 


n Ai ft 
□.010 


U.Ulo 




319 


albumin 


ALB 


2.13 


l.D/ 


ft 70 
U. /O 


n ftfii 


0.005 


0.231 


666 


albumin 


ALB 


0.72 


0.59 


0.20 


0.002 


n AOA 


UAO 1 


800 


albumin 


ALB 


2.34 


1 on 


n co 


n ft.m 


n nil 


0.383 


873 


lectin, galactoside-binding, soluble, 1 


LGALS1 


1.95 


•i cn 

i.by 


ft DO 
U.DO 


u.uuu 


n no? 


0.310 




(galectin 1) 












0.046 


O.607 


928 


ADP-ribosylation factor 1 


AAF1 


0.22 


0.19 


0.06 


0.012 


522 


annexin A5 


ANXA5 


0.46 


{J.cv 


ft QO 


0.429 


0.202 


0.012 


515 


proliferating cell nuclear antigen 


PCNA 


Air 


U. 10 


ft Ofi 


o no? 


0.011 


0.464 


577 


stratifin 


SFN 


0.78 


1.39 


0.41 


0.129 


0.002 


0.029 


626 


heat shock 27 kDproteinl 


HSPB1 


0.87 


1.18 


0.30 


0.000 


U.QCK 


ft -too 


631 


heat shock 27 kDproteinl 


KSPB1 


1.04 


1.35 


0.46 


0.003 


0.017 


0.277 


793 


non-metastatic cells 1, protein (NM23A) 


NME1 


0.36 


0.43 


0.59 


0.003 


0.033 


0.253 


OUf 


l0iib-Mrifa-a<ttflrtatpd nhofiDhOOf Otein D1 8 


LAP 18 


0.03 


0.05 


0.92 


0.000 


0.000 


0.351 




(stathmtn) 










0.000 


O.OOO 


0.732 


809 


leukemia-associated phosphoprotein p18 


LAP18 


0.55 


0.50 


3.88 




^OVLUIIHH if 












0.001 


0.447 


931 


SI 00 caictunvotnoing pnxeHi pa 


S100A9 


0.95 


1.18 


0.24 


0.026 




/rolnranufin B) 












O.0Q1 


0.188 


104 


protein phospnatase z rjorrneny ^m/, 

rant itatrwv 


PPP2R1B 


0.17 


0.13 


0.65 


0.000 




subunit A (PR 65), beta feoform 












0.049 


0.906 


110 


procollagen-profine, 2oxoglutarate 4-cfioxy- 
genase (proline 4-hydroxytase) beta 
polypeptide (protein disulfide isomerase; 


P4HB 


0.10 


0.10 


0.30 


0.014 




thyroid hormone binding protein p55) 












0.000 


0.751 


183 


intemexin neuronal intermediate filament 


NA 


0.04 


0.04 


0.16 


0.000 




protein, alpha 












0.000 


0.028 


229 


tubulin, beta polypeptide 




U. 14 




u.oo 


0.000 


289 


keratin 15 


KRT15 


0.36 


0.29 


0.65 


0.028 


0.009 


0.343 


295 


enoiase 2, (gamma, neuronal) 


EN02 


0.10 


0.23 


0.39 


0.000 


0.065 


0.026 


376 


SET translocation (myeloid leukemia- 


SET 


0.25 


0.17 


0.71 


0.000 


0.000 


ft IY21 




associated) 












0.000 


O.004 


439 


creatine kinase, brain 


CKB 


0.11 


0.05 


0.16 


0.033 


460 


annexin A1 


ANXA1 


0.43 


0.42 


0.59 


0.014 


O.026 


0.691 


476 


small glutamine-rich tetratncopeptide 


SGT 


0.16 


0.19 


0.33 


0.000 


0.000 


0.241 




repeat (TPRKontaining 










0.000 


0.001 


0.697 


576 


tyrosine 3-nK)fK»xygenaseAryptophan 
5-monooxygenase acctivation protein, 


YWHAE 


0.40 


0.38 


0.82 




epsiton polypeptide 










0.000 


0.006 


0.703 


579 


tyrosine 3-n^aMcygenaseArypthophan 
5-monooxgenase activation protein, 


YWHAQ 


0.52 


0.55 


0.91 




theta polypeptide 














0.336 


615 


tyrosine 3-monooxygenase/ 
tryptophan 5-monooxygenase 
activation protein, zeta polypeptide 


YWHAZ 


0.93 


1.09 


1.79 


0.000 


0.003 
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Spot 
# 


Unigene description 


Offical 
gene 


Mean 
adeno- 


Mean 
squa- 


Mean 
small 


f-test 

dUcnUCai" 


f-test 

email ppII 
oil loll ecu 


i- test 

auci ltA*al 








car- 


mous 


cell 


cinoma vs 


vs squa- 


cinoma vs 












small cell 


mous 


squamous 


OOO 


ithini irtin rarhnwl-tprniinal cstfifSSe L1 

UUIUUIUII l^dllAJAjl lEIIIIIUCLl Wwiw" *• ■ 


UCHU 


0.17 


0.32 


0.85 


0.000 


0.005 


0.153 




(ubiquitin thiolesterase) 












0.014 


0.961 




rptinnl-htndinn rvotetn 1 cellular 


R8P1 


0.42 


0.41 


0.77 


0.006 


856 


cellular retinoic acid-binding protein 2 


CRABP2 


0.25 


0.38 


0.63 


0.000 


0.017 


0.037 


902 


enhancer of rudimentary (Drusophila) 


ERH 


0.38 


0.35 


0.76 


0.000 


0.000 


0.455 




homotog 












0.001 


0.950 


910 


S100 calcium-binding protein A8 


S100A8 


1.46 


1.43 


0.35 


0.040 




(calgarnulin A) 












0.073 


0.013 


934 


keratin 17 


KRT17 


0.16 


0.30 


0.15 


0.768 


693 


alltximin 


ALB 


2.63 


1.98 


0.92 


0.000 


0.008 


0.138 


737 


superoxide dismutase 2, mitochondrial 


S0D2 


1.17 


1.22 


0.54 


0.013 


0.001 


0.836 


789 


collagen, type XV, alpha 1 


C0L15A1 


0.57 


0.50 


0.26 


0.031 


0.186 


0.658 


906 


S1 00 calcium-binding protein A1 1 


S100A11 


2.95 


2.62 


0.53 


0.000 


0.000 


0.506 




(caigizzarin) 










0.000 


0.004 


0.034 


924 


lymphocyte cytosolic protein 1 (L-pbstin) 


LCP1 


0.18 


0.13 


0.05 



A O 
A 



<b AO 

AO 



V 
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Figure 6. First two principle components for 52 protein 
spots distinguishing between lung tumor types. Small 
cell lung cancer samples are shown as squares, adeno- 
carcinomas as circles and squamous lung tumors as 



tion such as proliferating cell nuclear antigen (PCNA) and 
(xKoproteln 18 (Op18) [12-151. particularly the once- 
phosphorylated form of Op1 8, as well as protein products 
of the UCHU , RBP1 , CRABP2, KRT1 5, and TUBB genes 
among others. Squamous cell and adenocarcinoma sam- 
ples had greater amounts of the S10O proteins S100A8, 
S1 00A9, and S1 00A1 1 , as well as larger average amounts 
of both the unphosphorylated and phosphoryiated 27 kD 
heat shock protein (HSPB1). These two groups also had 



larger amounts of several protein spots detected on these 
gels that did not occur in similar gels made from cell lines 
and were thought to be cleavage products from proteins 
present in cells or plasma surrounding the tumor cells 
[e.g. cleaved albumin). The number of pjrotein spots that 
differed between lung adenocarcinomas and squamous 
tumors were fewer than the number of proteins that dis- 
tinguished between small cell lung cancer and the other 
two lung cancer types. EN02 was smallest in the adeno- 
carcinoma group, while ANXA5 and CKB were lowest and 
KRT1 7 and SFN highest in the squamous carcinoma sam- 
ples. Several interesting spots found in the study remain 
to be definitively identified. 

5.2 Correlations between RNA and protein 
expression 

The availability of mRNA expression data from micro- 
arrays or Affymetrix chips for the same samples for which 
we have protein 2-D gel data permits several additional 
types of questions to be asked. We have thus far enter- 
tained only simple models of protein/mFUMA relationships 
that ask which mRNA levels are most correlated with pro- 
tein spot sizes. Figure 7 depicts such a correlation matrix 
using colors rather than numerical data, since this makes 
it easier to visualize the relationships. In cases for which 
the identity of the protein spot is known such investiga- 
tions can answer the question of how well mRNA levels 
for a protein predict that protein's abundance. In cases 
of protein spots that have not yet been identified, or iden- 
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Figure 7. Correlation matrix of 30 protein spots (columns) 
with mRNA levels as measured by 200 probe-sets on 
Affymetrix HuFL chips. The correlation coefficients are 
depicted with colors, bright red being near-perfect corre- 
lation (r = 1) and bright green anticorrelation (r = -1). Tne 
figure was made using the TreeView software (ranalbl. 
go v/Eisen So ftware.htm). 

tified without high confidence, such correlations can lead 
to or confirm hypothetical spot identifications. More gen- 
erally one can search for larger groups of proteins and 
mRNA whose abundances are controlled by some com- 
mon mechanism. 



5.3 Identification of novel lung cancer markers 

We have utilized a proteomic approach to identify pro- 
teins that commonly induce an antibody response in lung 
cancer. Such identified proteins or their corresponding 
autoantibodies likely have substantial utility for cancer 
diagnosis. There is also evidence that autoantibodies 
may be present prior to clinical diagnosis and therefore 
detection of autoantibodies or of circulating antigens 
may have utility for screening and early diagnosis of can- 
cer. We have identified a battery of proteins that induce 
autoantibodies that are specific for different types of can- 
cer. We have identified a panel of autoantibodies that are 
detectable in serum of lung cancer patients at the time 
of diagnosis. The availability of a database of protein 



expression in lugg cancer has facilitated the identification 
of proteins thit induce autoantibodies in addition to 
providing valuable information regarding the expression 
pattern of such antigens in different tumor types and cell 
lines. One such antigen we have identified in lung cancer 
is protein PGP 9.5 (Fig. 8) (Brichory et a/, manuscript sub- 
mitted) (16]. PGP 9.5 was identified as a protein in lung 
cancer that induces autoantibodies as part of a study in 
which sera from 64 newly diagnosed patients with lung 
cancer, from 99 patients with other types of cancer and 
from 71 noncancer controls were analyzed for antibody- 
based reactivity against lung adenocarcinoma proteins 
resolved by 2-D PAGE. Gets containing separated pro- 
teins were blotted and subsequently hybridized with indi- 
vidual sera from patients or controls. Unlike controls, auto- 
antibodies against a protein identified by MS as protein 
gene product 9.5 (PGP 9.5) were detected in sera in 9 out 
of 64 patients with lung cancer. 

Circulating PGP 9.5 antigen was detected in sera from two 
additional patients with lung cancer, without detectable 
PGP 9.5 autoantibodies. PGP 9.5 is a neurospecific poly- 
peptide previously proposed as a marker for nonsmall cell 
lung cancer, based on its expression in tumor tissue. Using 
A549 lung adenocarcinoma cell line, we have demonstrated 
that PGP 9.5 was present at the cell surface, as well as 
secreted. Thus, the findings of PGP 9.5 antigen and/or anti- 
bodies in serum of patients with lung cancer suggest that 
PGP 9.5 may have utility in lung cancer screening and diag- 
nosis, as part of a panel of such proteins or their corres- 
ponding antibodies, which we have identified. 



6 Web pages 

The relational database for storage of sample, image, 
protein information and other related data is being con- 
structed in a stepwise fashion. The construction of a 
comprehensive database to collect all pertinent informa- 
tion is rather challenging and necessitates substantial 
resources. Similar effort in this area includes WebGe! that 
is a web based gel database analysis system that con- 
tains previously quantified gel data generated from a 
stand-alone quantitative get analysis system [16). Public 
WebGeJ demonstration databases currently available 
can be found in the web site (http://wvw-tecb.ncif erf. 
gov/webgel WebGel database). The task of web based 
retrieval of data from the protein database is rather com- 
plex as there are different kinds of data that may need 
to be retrieved. The microarray data could be stored in 
the database instead of Excel files, and the Access 2000 
database that the MS team utilizes could be transferred to 
the database. Tables are being built to eliminate any 
handwritten collection of data. Developing a database is 
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Figure 8. 2-D PAGE and Wes- 
tern blot analysis of A549 lung 
adenocarcinoma cell proteins. 
Panel 1 shows A549 2-D protein 
pattern after silver staining. The 
boxed area is shown in panel 2, 
in which arrows point to the 
location of PGP 9.5 forms 
(spots P1 to P3) recognized by 
sera from patients with lung 
cancer and the position of the 
form P4 recognized by a poly- 
clonal rabbit anti-PGP 9.5 anti- 
serum, which also recognizes 
P1-P3. Panel 2 shows close- 
ups of western Wots hybridized 
with two different sera from 
patients with lung adenocar- 
cinoma that showed reactivity 
against PGP 9.5 proteins. 



hard because of complex and very large amount of 
unstructured data generated. There are conflicting 
pressures between 'using what we've got already" 
and constructing something better. Sometimes there is 
a natural break in the data, such as when a shift is 
made from one platform type to another. Then one 
could "pile up" old data and organize it neatly. On the 
other hand, when new technologies are introduced, 
they require new ways of storing the data. The lung 
protein database is continuously evolving to enhance 
the relational schema to be more flexible and compre- 
hensive and to make data processing more robust and 
automatic. 

The lung protein database is a backbone to record pro- 
teome data for many different studies and to mine the 
existing data for new discoveries. The new generation 
UPS provides investigators web-enabled interfaces to 
the laboratory databases and 2-D images with internet 
access. There is certainly a need for sharing information 
in the database on a global basis. We have used internet 
and WWW technologies to provide a distributed process 
with easy-to-use front-end user interface. Figure 9 shows 
a top level view of a web-based process for perfonning 
our studies from a data processing perspective. Some of 
our web pages were developed in Visual InterDev and 
ASP development environment on Microsoft and some 
were developed in Oracle 8i and WebDB web application 
environment on Solaris. As an example, the MS data web 
page is shown in Fig. 10. Detailed "how-to" document- 
ation is provided as on-line help for recently extended 
capabilities of UPS. 




Figure 9. Web-based process of using lung protein data- 
base. 



7 Conclusion 

The value of the database we have constructed depends 
to a large measure on Ms content, the quality of data and 
the ease with which data can be retrieved and analyzed. 
While the amount of data generated is already quite size- 
able, it is likely that the database will continue to undergo 
substantial expansion. Proteins are being identified at a 
rapid pace, thus enhancing our ability to Rnk protein 
expression data with RNA based expression data for cor- 
responding genes. As such, the database will play an 
important role in achieving our objective of developing 
novel classification schemes for lung cancer and the 
identification of novel markers for early diagnosis. The 
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FigurelO. MS data web page. 



database will also serve as a useful resource for other 
investigations of lung biology and of diseases other than 
lung cancer. 
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Translational region of human P 53 gene 
expression ™ 
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In blast cells obtained from patients with acute myelo- 
genous leukemia, pS3 mRNA was present m all the 
Lnples examined while the expression of p53 protein 
was variable from patient to patient Mutations in the 
P 53 gene are infrequent in this disease and, hence, 
variable protein expression in the majority of the 
samples cannot be accounted for by 
this study, we examined the regulation of p53 gene 
expression in human leukemic blasts and charactered 
S^S transcripts in these cells. We found control 
both at the level of RNA abundance and at the 
level of translation- Four experiments point towards 
translational control of human P 53 gene expr^ioiu 
First, there is no correlation between the Ifvd «f P» 
mRNA and the level of P 53 protein expression in blast 
STsecond, in two cell lines with sundar evels of 
Siprotein repression but with different levels of p53 
mRNA, we find that there is preferential a^oatjn 
of p53 mRNA with large polysomes in the celb with 
less p53 RNA- Third, translation of 7"*^*^° 
P 53 Eanscripts in cell-free extracts is inhibited by the 
111 3'UTR. Fourth,-4he P 53 3'UTR, when present 
inch, can repass translation of a heterologous tran- 
SrtaL These observations raise the posribOity that 
human p53 mRNA translation may be regulatedm nvo 
bv RNA binding factors acting on the p53 3 UiK. 
Keywords, acute myelogenous leukemia/ P 53AransUtional 

control 



The scarcity of p53 gene mutations in AML is,po<£ 
unique to this disease. For example, p53 gene matatioris^ 
are rare in neuroblastoma, testicular tumors and.Hfevy, 
positive cervical cancer. While the .p53 gene is. mo^.| 
commonly inactivated through mutation in human par— '-• J 
p53 protein function can also be disrupted through^ 
genetic mechanisms including protein-protein rateractiSp. 
(Scheffnercrat. 1990; Momandrf al, 1992; QBnefita^ 
1992- Ueda et al, 1995), protein conformational. change vj* 
(Miliier, 1991; Ullrich et al, 1992) and nuclear exclusion. 
(Moll et al, 1992, 1995). Indeed, two groups • have 
suggested that inactivation of wild-type p53 protemin ^vjj 
AML occurs through a mechanism involving cortfo^gjfi 
ational change of the protein (Zhu et aL, 1993; .Zhang ^ 
etal, 1992). . . . - • - 

The level of p53 protein expression in primary *last._, 
cells obtained from AML patients varies from patient to.^ 
patient- In previous studies from this laboratory p53 ■ 
protein expression was detected in only 45% (34 of 
75) blast samples examined by metabolic labelling with 
[ 35 S]nietWonine and immunoprecipiution (Smith et al, 
1986- Benchimol et al, 1989; Slingeriand et al, 1991). 
Zhang et al (1992) detected P 53 protein expression in. 
blast samples from 75% (37 of 49) AML patients Several 
reasons may explain the absence or very low level of p53 
protein expression in certain blast samples. These include 
• low levels of P 53 mRNA, inhibition of p53 mRNA^ 
translation and extremely rapid turnover of newly synt*-- 
sized P 53 protein- In this study, we have 
regulation pf P 53 gene expression in human AML b^ 
and find control both at the level of RNA abundance and ,7§ 
at the level ot translation. Translational regulation i is ^ 
supported by experiments in which we demonstrate that. ,, ^ 
thep533' untranslated region (3W) can repress transla- .:j 
tion of p53 RNA and of heterologous transcripts in ecu-. - 
free extracts. 



Introduction 

Human acute myelogenous leukemia (AML) is a clonal 
disease arising in a very early hematopoietic progenitor 
cell following multiple carcinogenic events CW>g«an? 
Tal , 1978; Fialkow et al. 1987). Mutadon _of Jte p53 
Ln« suppressor gene occurs ^frequently in die bUst 
cells of AML patients (Fenaux et al. 1991. 992. 
Slingeriand et al, 1991; Sugimoto etal , 1991. 1993, 
Zhang et al, 1992; Trecca et aL, 1994; Wattel^ etal 
1994fLai et al, 1995). P 53 mutations have been detected 
in -10% of all AML patients, mostly in patients with Jp 
monosomy who had lost the normal .remaining P 53 . allele 
(Lai et al. 1995). These studies demonstrate that p53 
mutations are not required for the development oTAML. 
Mutations that do arise, however, are generally recessive 
in nature, indicating a strong selective pressure to eliminate 
completely wild-type p53 protein function. 
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Results 

Expression of p53 protein in human AML . 
Leukemic blast cells from AML patients and three human 
acme leukemia cell lines OCI-M2, OCI/AML-3 and OOJ 
AML-4 were characterized for P 53 protein "Pf^'f?,^ 
metabolic labelling and immunoprecipiution. ^"^t^ 
a human erythroleukemia cell line (Papayannopoulou 
et al.. 1988) previously shown to contain a a^» 
mutadon in the P 53 coding region « «ff ,J74 tmd» 
have lost the homologous wild-type p53 allele (f JngW 
etal 1991). OCI/AML-3 and OCI/AML4 ceU hues were 
derived torn the primary blasts of two AML patien« 
S*,r al, 1989). The full-length ^53 
these cells were ampUfied by RT-PCR, and the jptoducfi 
directly sequenced. We found that the P 53 tr^cn^° 
both cell lines were wild-type throughout their coding 



-I 

4 



C Oxford University Press ^ 



^1 



//// 




oa-Mi 



ocw ocv 

AMt/J AMU4 



pctieat prtkst P«t*»< 




Fi*. 1. Expression of P 53 protein in human leukoma cellMA) Cell 
fees Oa-MiOCI/AML-3 and OCl/AML-4. and bUst cells ftom 
5S. pita* were OKtaboUcauy labelled wi<h ["SJmeduomne foe 
15 minat 2TC Cell extracts wen: prepared and pocwro representing 
equal amounts of trkWoroacetic acid-insoluble "di«|cuvuy 
HO 7 cp.m.) were irnmunoprecipiufcd with the control monoclonal 
antibody (PAM19) or with monoclonal antibodies against p53 
n^Mzl) (B) Detection of p53 protein in 5X 10 s ceOs by Western 
^nVbtotdng and ECL using PAbWOl monoclonal anybodies. 

regions as well as through their 5'- and 3'UTRs. The only 
difference detected in the P 53 transcripts expressed in 
OO/AML-3 and OCVAML^ cell lines was the recognized 
polymorphism at codoo 72 (MaOasnewsta et al 1987) 
Siting in an argiffine residue in OCI/AML-3 and a 
proline residue in OCI/AML4 at positiooJZ 

The level of protein expression measured by metaboUc 
labelling and immunoprecipitation is dependent pnmanly 
on the rate of protein synthesis. Ac rate of protein 
degradation and the amount of mRNA available for 
translation. To rninirnizc the contribution of protein half- 
life on the detection of P 53 protein synthesis during the 
metabolic labelling assay, cells were exposed to a short 
iTmin pulse of [«S]rnethionine at 37'C followed by 
immediate lysis on ice in the presence of protease mhibi- 
S RadiolLlled cell extracts prepared m mis way were 
then subjected to irnmunoprecipitat on with p53-specific 
antibodies P 53 protein with a half-life much less than 15 

P 53 protein synthesis was detected in OCI/AML-3, OCV 
S£J and ta 0CI-M2 (Figure 1A) as we U « in seven 
of 16 blast samples tested; three representative examples 

are shown in Figure 1A. .... .u„„ „„n 

The steady-state level of P 53 pr<*ein in the three cell 
lines was determined by Western blot analysis using 
PA?1801. Denstometric scanning of the blot shownin 
Figure IB revealed that the amount of P 53 in 00/ 

SVand OCl/AML-4 was similar and -iWold lower 
than in 0CI-M2. The high level of P 53 protein in OCI- 
K^was «pected since mutant P 53 polypeptides usually 
rTve muchtonger half-Uves than wild-type p53 proteins 




F*. 2. Northern blot analysis of P 53 mRNA in human AML cells. 
20ug of total RNA isolated from cell lines or pab«t blastsamples 
^Separated on a 1% agarose gd containing 
S to nitrocellulose and hybridized with «P4abe^ human 
p53 cDNA. After autoradiography, the prrf* £V 
filters were hybridized with a probe specific for lSSntaoal RNA. 
The relative abundance of P 53 mRNA was detenmned by 
pbosphorimage analysis after normalizing to the value of 18S 
ribosomal RNA in each sample. 
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F«. 3. Relative abundance of P 53 mRNA m cells that do or do not 
express detectable P 53 protein. p53 protein s > T ^^J^i m , 
TmL blast samples by metric UbeUing ^l^l^^. foC 
15 min and irrimunoprectpttatioo. P 53 protein synthesis wasdetected 
h of th^^a^^njRNA £^J*«~** 
Northern Wot analysis as described in the legend to Figure Z 

and as a result mutant. P 53 polypeptides accumulate 
intracellularly. 

Expression ofp53 mRNA In Human AML 
To determine whether the differences in P 53 protein 
expression in leukemic blasts reflected d^erences m 4e 
abundance of P 53 mRNA, RNA ™^^ t fm ™J^ 
blast samples and cell lines, and subjected to Northern 
blot analysis. The relative abundance of p53 mRNA _m 
cells was estimated by pbosphorimage analysis after 
normalizing to the value of 18S ribosomal ^RNA » each 
sample. The results are shown in Rgure 2 ^ 
that the 16 AML blast samples examined synthesaed a 
single species of full-length P 53 mRNA -2.8 kb in size. 
Ttere S amount of P 53 mRNA in the 16 sampte 
varied over a 27-fold range. No correlation was evident 
between P 53 protein expression (on the «f« ^ 
min metabolic labelling assay) and the level of P 53 mRNA 

ta OT^2wAllW cells contained shajlar 
an^n^53 protein. However. *e RNA blot sb^rn 
in Figure 2 indicated that the abundance ^ if 3 
was l£d higher in OCI/AML-3 than in OCI/AML^- A 
4- to 8-fold difference in P 53 RNA was seen in repeated 
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were collected and the .mount of p53 mRNA m each 
£^J*He«nU»rf by dot-Wot hybridization analysis wth a 

JLecTto the gradient was estimated using « polysome P^P*^°° 

and polysomes are indicated. Error bars represent me suno*™ 
the mean from three separate experiments. 

crimen ts after. c , wi* 
|^"a^ on^geS. We conSude that 
ntt RNAlevels and p53 protein expression are variable 
L X b S and ccU lines, and that the level of P 53 
^expression is not related to the amount of P 53 
mRNA in these cells. 

Association ofp53 mRNA with polysomes 
P53 gene expression is 

polysomes in JrN A is more translationally 

results suggest, then a larger proportion of the P 53 mRNA 



present in CKI/AML-. ^o^^^atcd vntj^ 
polysomes comp— *with OCI/AML-3. Cells ^ 
lected and lysedWbe presence of cyclol 
MgCl 2 which stabiUze the association of nU^ 
mRNA The lysates were sedimented through a „ 
sucrose gradient and fractions were collected RNAwas! 
extracted from each fraction and analyzed for the presence* 
of p53 mRNA by dot-blot hybridization with a ^lalodlSi; 
p53 cDNA probe. The gradients were calibrated wirk 
polysomes prepared from lysates by precipitation wrdi| ? 
100 mM MgCl 2 . Polysomes were found at the bottom of .;.| 
the eradient in fractions 5-10. while monosomes were ^ 
SJdin fractions 1-4. P 53 mRNA from OO/AMW, 
cells was associated with larger polysomes than was p53 
mRNA from OCI/AML-3 cells (Figure 4). In OCI/AML4.23 
cells. 39% of the P 53 mRNA was found in fractions 7- ; * 
10 containing high molecular weight polysanes. wUfe in ? 
OCI/AML-3 cells 21% of the P 53 mRNA was found in 
these same fractions. As an internal control, the distribution 
of ribosomal protein L35 RNA was compared and shown 
to be identical in OCI/AML-3 and OCI/AML4 (data , 
not shown). 

Analysis of the F end of p53 mRNA 
The human P 53 gene has been shown to have a cluster of 
six or seven major transcription initiation sites .and several 
minor sites lying further upstream (Tuck and .Crawford 
1989) Transcripts initiating from the minor sites would 
have a longer 5'UTR with potential to form a stable stem- 
E* stnicLe close to the 5' cap. Such st™ctm«would 
not be expected to form in transcripts ini^ting from ^ 
major startsites. 5'-stem-ioop structures were described 
Sentp53rnRNA^</«i.. 1984; Bicort*™ -g 
et al 1985). Recently, mouse P 53 protein was shown to 
• bind to the 5'UTR and to inhibit translation of »ts own. ^ 
mRNA in an in vitro assay system (Mosner <<^^g 
Stable stem-loop structures in the 5'UTR regions V% 
numbVoHnRNA transcripts have been: show, £ intfbj 
tnmslaUon initiaUon by interfering with the 
Station initiation factors or by serving as 
S^gulatory proteins that inhibit traiislaaon (Feng and g 
K lSlFu ct-oL, 1991; Melefors and Hentze. & 

1993- Pause et aU 1993); . . 

To determine if the low level of P 53 protein expression 
intaataaSLts was the result oft™ P Uonu« 
£ me minor start sites, the 5' ends of p53 n^A^pn^ 
m different blast samples and cell lines were mapped 
using TRNase protection assay. A 729 nucleotide ant> 
seS proS^Snmg genomic 
the t>53 oromoter region fused with cDNA sequence 
«ten^toe^ 
SP6 RNA polymerase in the presence of [ 
5A) This probe would yield protected P 53 fragme nfi s « . 

385 nStides correscLling to transcripts^^ 
frommemajorstartsiteand449nucleoUdesco^nd^ 

to transcripts originating from the most .5 
start sites. Total RNA exacted f™"^^ 
OCI/AML-4 cell lines and from seven A^blastsam^ 
was examined. After digestion, the protected JWP»» 3| 
wSe resolved by electrophoresis on a 

acrylamide gel. ^ t ^ m £^JSS^^^ 
protected fragment in all the »mpi ^ 
nucleotides inlength indicating a common site tor uu 
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RNast protection assay. (A) The map of ihe p729 plasmid. The 
p 729 plasmid was constructed as described under Materials and 
■K*ods. After linearization with rYwdlll. a 729 nucleotide antisense 
ENA probe was generated by transcription with SP6 RNA polymerase 
jidfag protected p53 fragments of -385 nucleotides due to p53 
ttawcripts initiating from one of the major start sites (a) and 449 
aodootides due to p53 transcripts initiating from the most 5' of the 
cm transcription start sites (b). (B) The 729 nucleotide pPJUTP- 
mdkd antisense RNA probe was annealed to 30 |Xg of total RNA 
enacted from OO/AML-3 and OO/AML-4 cell lines and seven 
AML blast samples before digestion with RNase A and RNase T t . 
Tie protected fragments were separated by electrophoresis on a 6% 
pd^acrylaaudc-3 M urea gel and visualized by autoradiography. The 
pookoos and size (nucleotide length) of 5' end-labelled fragments of 
Undigested pBR322 plasmid DNA are indicated on the left. The 
mm arrow indicates the position of the major protected fragment 
fltfe top arrow indicates the undigested probe. 



of f53 gene transcription in leukemic blasts at the major 
ststsite. These data indicate that, in contrast with murine 
pa mRNA, stable secondary structures are unlikely to 
t at the 5' end of human p53 mRNA. 
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Fig. 6. In vitro translation of synthetic p53 RNA containing variable 
portions of the 3'UTR. (A) Plasmid template used to synthesize p53 
RNA in vitro. The p2516 plasmid was constructed by inserting the 
entire 23 kb wild-type p53 cDNA sequence downstream of the 
bacteriophage SP6 pcomoter in a pSP64-derived plasmid. Transcription 
from the SP6 pcomoter present in p2516 leads to the production of 
transcripts in which the first 10 nucleotides are derived from plasmid 
sequences while the remaining nucleotides are derived from the p53 
gene beginning at the +7 position of native p53 transcripts initiating 
from one of the major transcription start sites (Tuck and Crawford, 
1989). Linearization of p2516 at the EcoRl site before in vitro 
transcription generates a full-length, AJu-coataining p53 transcript 
(T2516); linearization at the BamHL site provides a template for the 
synthesis of a truncated p53 transcript missing a portion of the 3'UTR 
containing the Alu sequence (T2034). Both transcripts were 
polyadenylated in vitro to generate p2516An and p2034An. The open 
rectangles shown -on the transcripts represent the position of the p53 
coding region. (B) 50 ng of the in wfro-synthesized T2034, T2516, 
T2034An and T25l6An p53 RNAs were translated in a rabbit 
reticulocyte lysate at 30*C for 30 mm in the presence of 
(^)methionine followed by imrbuixDpcccirHtatioa, SDS-PAGE and 
autoradiography. In the 3X T2516 and 3X T25l6An lanes, 150 ng of 
T2516 or T2516An RNA was added to the m vitro translation 
reaction. The right panel presents the results of a Northern blot in 
which 50 ng of synthetic p53 RNA was applied to an tgarosc- 
formaldehyde gel blotted and hybridized to ^P-labelkd human p53 / 
cDNA. 



Anfys/s of the 3* end ofp53 mRNA 
Hnan p53 mRNA contains a long 3'UTR of 1176 
twieotides with an Alu-like repetitive sequence element 
of -470 bp located immediately upstream of the poly(A) 
tai(Matla$hewski e't ai t 19S4). The Alu-like sequence is 
in Ac reverse transcriptional orientation with respect to 
tic p53 gene. Furthermore, the Alu-like sequence is 
losing in murine p53 transcripts and it interrupts a region 
iafaunan p53 mRNA which shows homology to mouse 
pB mRNA. When analyzed with the FOLD program of 
QGG, the Alu-like element in the 3'UTR of human p53 
inTKA is predicted to form an independent secondary 
stadure that does not have long-range interactions with 
OBtr regions of p53 mRNA. In the presence of a poly(A) 
t£.tie secondary structure formed by the Alu-like element 
ispedicted to remain essentially intact except that a 50 
«&otide U-rich sequence at the 5' boundary of the Alu- 
Qe sequence will interact with the poly(A) tail. The 



extended base pairing between U and A residues will 
further stabilize the secondary structure formed by the 
Alu-like element To determine whether or not the Alu- 
like repeat present in human p53 mRNA might constitute 
a negative regulatory element during translation, a series 
of in vitro transcription-translation experiments was per- 
formed. 

An SP6-derived plasmid containing human wild-type 
p53 cDNA including the entire 3'UTR was constructed 
(p2516 in Figure 6A). p2516 was linearized with EcoM 
or with BamHl and used as a template for in vitro 
transcription. In some reactions, a poly(A) tail of 200- 
300 adenylic acid residues was added to synthetic p53 
RNA using poly(A) polymerase. In this way, four synthetic 
p53 transcripts were generated; T2516An and T2516 
represent full-length, Alu-containing. transcripts with or 
without a poly(A) tail; T2034An and T2034 represent 
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shorter, Alu-deficient P 53 trarisca^ with or without a 
poMA) tail. These transcripts we^Ben used as templates 
for translation in a rabbit reticulocyte lysate containing 
[^methionine. p53 protein synthesized in vitro was 
immunoprecipitated with PAb421 monoclonal anubody 
and visualized by autoradiography (Figure 6B). The 
amount and integrity of the synthetic P 53 RNAs added to 
the in vitro translation reactions was monitored by agarose 
gel electrophoresis and Northern blotting as shown in the 
right panel of Figure 6B. Densitometry tracing of the 
data indicated that the Alu-containing, non-polyadenylated 
transcript T2516 was translated ~3-fold less efficiently than 
the Alu-deficient. non-polyadenylated transcript T2034. In 
addition, the polyadenylated. Alu-containing transcript 
T2516An was translated ~20-fold less efficiently than the 
polyadenylated, Alu-deficient transcript T2034An. These 
data indicate that the Alu-like element present in the P 53 
3'UTR can inhibit p53 mRNA translation in vitro, even 
in the absence of a poly(A) tail. The predicted interaction 
of the poly(A) tail with the Alu-like element appears to 
increase further the inhibition of translation. ( 

To test further the inhibitory activity of the p53 3 UTR, 
we examined the ability of the p53 3*UTR to control the 
translation of a heterologous RNA. The Alu-containing 
P 53 DNA fragment extending from nucleotides 2034 
to 2516 was excised from plasmid p25l6 and inserted 
downstream of a heterologous gene (CAT gene) in an 
SP6-based plasmid vector to generate the plasmid pCAT- 
Alu (Figure 7A). In vitro transcription and translation 
revealed that non-polyadenylated CAT-Alu RNA was 
translated 5-fold less efficiently than non-polyadenylated 
CAT transcripts lacking the Alu sequence (Figure 7B). 
When a different region of the P 53 3'UTR (nucleotides 
1465-2034 in plasmid P 2516) with approximately the 
same length as the Alu-containing fragment was inserted 
downstream of the CAT gene, no effect on CAT translation 
was observed (CAT-BS in Figure 7B). The ability of the 
Alu-containing wgment of the P 53 3'UTR to act on a 
heterologous transcript indicates that it likely represses 
translation independently of upstream sequences. 

The inhibitory activity of the Alu-like element on p53 
translation was likely the result of its action in cis and 
not simply due to non-specific inhibition of translation, 
since a 3-fold increase in the amount of Alu-containing 
transcript added to the reticulocyte lysate resulted in a 
corresponding increase in the amount of p53 protein 
synthesized (Figure 6B). Furthermore, when 200 ng of 
luciferase RNA was added to a reticulocyte lysate together 
with 200 ng of CAT-Alu or CAT-BS RNA. there was little 
difference in the amount of luciferase synthesized (Figure 
7C). Similarly, when 200 ng of luciferase RNA was added 
to a reticulocyte lysate, either alone or mixed with 200 ng 
of 12034 or T2516An RNA, there was little difference in 
the amount of luciferase synthesized (data not shown). 

To confirm that the decrease in P 53 protein synthesis 
from Alu-containing P 53 RNAs was due to translational 
regulation and not due to preferential RNA degradation 
in the reticulocyte lysate. adcnylated T2034 and T2516 
synthetic transcripts were added to the rabbtt reuculocyte 
lysate under the same conditions as those used for w vitro 
translation. After incubation for 15 or 60 mm, RNA was 
extracted from the lysate and the amount of synthetic p53 
RNA present in the lysate determined by Northern Wot 




Fig. 7. The p53 Alu-Bx element can inhibit translation of t 
heterologous CAT transcript. (A) PUsmids used to generate CAT 
transcripts in vitro. (B) 200 ng of in vimvsynthesked CAT, CAT-Alu, 
and CAT-BS transcripts were translated in a rabbit reticulocyte lysate 
at 30*C for 30 nun "m the presence of l^JmeAtooine. The reactioas 
were stopped by adduig an equal volume of the ZX protein sample 
buffer, heated to 100*C for 5 min and analyzed by SDS-PAGE and 
autoradiography. An ethidiiim bromide-stained agarose gel 
demonstrating the integrity and amount of synthetic transcripts that 
" were added to the in vitro translation racrioa is shown below. 
fQ 200 ng of luciferase RNA was translated in a rabbit reticulocyte , 
lysate either alone or in the presence of 200 ng of CAT-BS or 200 n« ; . 
of CAT-Alu. Reaction mixtures were incubated in the presence of 
["Sliwahioiiine. at 30*C for 30 min and processed as in (B). Toe 
in i*nvsynAeazed luciferase protein is shown in 
RNA used for in vitio translation is shown in the ethidium bromide 
suined-agarose gel in the bottom panel 
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analysis. Enhanced degradation of the Alu-containing 
transcript was not observed (Figure 8). We conclude that 
a segment of the P 53 3'UTR encompassing the Alu-uke 
element is capable of repressing translation in vttro. 

Discussion 

The observation that wild-type p53 protein expression in 
leukemic blast cells does not correlate with the level ot 
P 53 mRNA mirrors findings reported previously for Waste 
and other human cell types (Matlasbewslri et al, 19»o. 
Kastan et al, 1991a; Slingeriand et al. 1991; Sasano 
et al, 1992; Hsu et al, 1993). The absence of detectable 
P 53 protein in cells expressing abundant levels of wuo- 
type p53 mRNA has usually been attributed to the snort 
half-life of p53 protein in normal cells (R?S d ^LJr 
1985) A similar situation exists in papillomavirus (tir vr 
infected cells such as HeLa cells where p53 P^f^- 
detected even though these cells produce p53 inRNA *w 
this RNA is associated with polysomes (Matlasnewsn 
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Fte 8. Stability of synthetic human p53 RNAs in rabbit reticulocyte 
lySles. 100 ng of adenylated T2034 (A) and T2516 <B) 
RNA was added to the rabbit reticulocyte lysate and incubated at 30 C 
for 15 or 60minunder the same conditions used for mvttm 
translation. RNA present in the lysates was then extracted and loaded 
on a 1* agarose-formaldehyde gel. The 0 min time point represent 
100 ng of iynthetic RNA loaded directly oo the gel The amount of 
nS3 RNA in each sample was then determined by Northern blotting 
Sng a »P-labelled human P 53 cDNA. The lower panel shows jthe 
2&S and 18S ribosomal RNAs recovered from the rabbit reticulocyte 
lysates detected by ethidium bromide staining of the geL 

etal 1986). Hie enhanced degradation of newly synthe- 
sized p53 protein in HeLa cells was shown to be promoted 
by the papillomavirus E6 protein which is expressed 
constitutively in these cells (Scheffner et aL, 1990). 

In this report, we present data showing that differences 
in P 53 mRNA abundance exist in AML blasts and that 
these differences cannot explain the heterogeneity m the 
level of p53 protein expression in leukemic blast cells. 
Using a metabolic labelling assay in which blasts from 
different AML patients were pulse-labelled with I i>\- 
methionine for 15 min to minimize the contribution of 
protein half-life on ufe detection of P 53 protein synthesis, 
we found differences in the level of p53 protein expression 
in blast samples. These observations raised the possibility 
that p53 gene expression may be regulated at the transla- 
tionallevcWcertamhurr^ceUs.Wetest^ 
by analyzing the distribution of P 53 mRNA on polysomes 
in vivo and by examining P 53 RNA translation in vitro 

We have used two AML cell lines, OCI/AML-3 and 
OCI/AML-4 that contain similar amounts of wild-type 
P 53 protein even though OCI/AML-3 contains 4- to Wold 
more P 53 mRNA. Comparison of the polysome profile of 
these cells indicated that a greater proportion of the p5 3 
mRNA was associated with larger polysomes in OC1/ 
AML-4 than in OCI/AML-3. p53 mRNA in both of these 
cell lines as well as in blasts from different AML patients 
is present as a single, full-length species of -2.8 kb 
that initiates from a common transcription start .site and 
contains similar sequence and structural elements. 

Transcription-translation experiments in vitro indicated 
that the p53 3'UTR contains a negative regulatory domain 
that is capable of repressing translation in vitro. A region 
of the 3'UTR consisting of -500 nucleotides and con- 
taining an Alu-like element is capable of repressing 
translation of P 53 mRNA and of a heterologous transcript. 
The p53 3'UTR, when present in cis, repressed translation 
of polyadeuylated as weU as non-polyadenylated tran- 
scripts: Accordingly, we suggest that the Alu-hke element. 



possibly through its Adary structure, is capable of 
repressing p53 mRNA translation. In addition, interaction 
of the Alu-like element with the poly(A) tail may repress 
the latter's function in translation. Experiments are in 
progress to map precisely this regulatory element in the 
p53 3'UTR and to determine if the p53 3 UTR plays a 
similar role in regulating translation in vivo. 

Our finding that p53 protein expression in AML blasts 
is controlled, at least in part, through mechanisms acting 
at the translational level, raises the possibility that transla- 
tional regulation may provide an epigenenc mechanism 
to reduce or even eliminate wild-type p53 protein function 
in leukemic blasts. In preliminary experiments to address 
this point, we have exposed blast cells that express httie 
or no detectable P 53 protein to 6 Gy of ionizing.radiauon 
and have observed increased steady-state levels of p53 
protein at 1.5 h after irradiation (data not shown). Geno- 
toxic agents have been shown previously to increase the 
level and/or activity of p53 protein through a post- 
transcriptional mechanism that is not weU understood 
(Kastan et ah, 1991b; Fritsche et al, 1993; Lu and Lane. 
1993; Zhan et al, 1993). Hence, blast cells retain the 
ability to up-regulate p53 expression in response to geno- 
toxic stress. At least under these conditions, p53 function 
may not be lost. This type of analysis, however, does not 
address the function of P 53 in proliferating cells that have 
not been exposed to genotoxic stress. In this regard, 
previous studies from our laboratory demonstrated a highly 
significant correlation between p53 protein expression in 
leukemic blast cells and the secondary plating efficiency 
of these cells (Smith et al, 1986). The latter provides an 
estimate of the self-renewal capacity of progenitor cells 
in the blast population. Deregulated P 53 expression might, 
therefore, be expected to affect the self-renewal capacity 
of blasts in the absence of genotoxic stress. 

Accumulating evidence demonstrates the involvement 
of the 3'UTR in translational control (Jackson, 1993). The 
demonstration that the 3'UTR of certain transcripts can 
control mRNA localization and polyadenylation provides . 
a mechanism for translational regulation (Huarte et aL, 
1992- Gavis and Lehmann, 1994). In addition, specific 
sequences within 3'UTRs have been shown to repress 
translation (Goodwin et aL, 1993; Evans et al, 1994; 
Kwon and Hecht, 1993). RNA-protein interactions are 
likely to be involved in 3'UTR-dependent translational 
repression. Indeed, a protein that binds specifically to the 
3'UTR of protamine 2 mRNA and represses its translation 
has been identified (Kwon and Hecht, 1993). If the P 53 
3'UTR can be shown to regulate p53 mRNA translation 
in vivo, it is possible that /ra/w-acting factors (missmg or 
inactive in reticulocyte lysates) activate components of the 
translational machinery to bypass this negative regulatory 
domain on human p53 mRNA. Such fra/u-acung factors 
could interact directly with p53 mRNA to enhance its rate 
of translation. Alternatively, /ro/u-acting factors ouectM 
to the p53 3'UTR (that are also present in reticulocyte 
lysates) may act as repressors of translation. 
in the level of P 53 protein synthesis among AML blasts 
and possibly other human cells could, therefore, be deter- 
mined by differences in the level or activity of these 
regulatory molecules. 
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Materials and methods 

OCl/AML-3 and OO/AML^t cell lines were derived from primary 
blasts of two AML prions (Wang et.aL 1989). The OCI-M2 ce I l.ne 
was derived from ihe primary blasts of a patient whose erythroderma 
represented the end stage of a previously iaentified myelodysp last.c 
syndrome (Papayannopoulou ,/ aL 1988). OCl/AML-3 and 0O-M2 
cells were grown in alpha-modified minimum essential medium (a-MEM ) 
fining W fetal calf serum (PCS) (GIBCO). The OC1/AML-4 cells 
were grown in a-MEM containing 10% FCS and 10% cond.tioned 
medium obtained from the human bladder carcinoma cell line 5637 
(56^7-CM) (Wan* et aL 1989). The AML blast cells were obtained 
directly from AML patients. The mononuclear cell fraction of peripheral 
blood was collected after separation through Rcoll-Hypaque (Pharmacia) 
(1.077 g/ml) and T-lymphocyte depletion (Minden et aL 1979). These 
cells were stored frozen in liquid nitrogen before use. 

Metabolic labelling and immunopreciprtation 
The blast celK of AML patients were thawed and incubated for 2 days 
at 37'C in a-MEM containing 10% FCS and 10% 5637-CM before 
metabolic labelling IXIO 7 cells were labelled with 0.2 mCi 1^1- 
methionine (DuPont NEN Research Products) in 05 ml a-MEM lacking 
methionine and containing 10% dialysed FCS at 37*C for 15 mm Cells 
were then immediately pelleted, the radioactive medium removed, and 
the cells Ivsed on ice in a solution containing 25 mM Tns pH 7.4 
50 mM NaCl. 0.5% sodium deoxycholate. 2% NP40. 0.2% SDS. 0.5 mM 
phenvlmethylsulfonvl fluoride (PMSF). I Mg/ml leupeptin and I Hg/ml 
aprot'inin for 20 min. Lysates were cleared by centrifugation. the 
supernatant was retained and incubated with 5 fig of a iwn-specific 
I»G2a mouse monoclonal antibody (Sigma) for 60 mm on ice. These 
were then reacted with 0.5 ml of a 10% suspension of formalin-treated 
Swplivtococcus aureus Cowan 1 cells (Pansorbin. Calbiochem-Behnngi 
for 30 min on ice. followed by centrifugation and retention of the 
supernatant. Portions of precleared lysates containing equal numbers ;of 
trichloroacetic acid-insoluble counts (I0 7 c.p.m.) were diluted in NET/ 
GEL buffer ( 150 mM NaCl. 5 mM EDTA pH 8.0. 50 mM Tns pH 7.4. 
0 05% NP40 0.02% sodium azide. 0.25% gelatin) and immunoprectpit- 
aied on ice for 2 h with PAb42l monoclonal antibodies against P 53 
protein or control PAb419 antibodies (Harlow et aL 1981 ). The immune 
complexes were collected on 60 ul prewashed protein A-Sepharose 
beads (Pharmacia), washed three times with NET/GEL buffer, and eluted 
into 30 ul protein sample buffer (2% SDS. 10% glycerol. 0.1% 
bromophenol blue. 25 mM Tris pH 6.8. 0.1 M dithiothreitol) by boiling 
for 10 min. The Sepharose beads were removed by centrifugation. the 
samples were loaofd on a 10% polyacrylarnide gel containing SDS and 
proteins were resolved by electrophoresis at 45 mA. Gels were fixed in 
73% acetic acid and 25% methanol for 30 min before drying and 
exposure to X-ray film (DuPont NEN Research Products). 

Western blot analysis 

5X I0 4 cells were lysed directly in an equal volume of 2X protein sample 
buffer. The extracts were passed through a 2 1 -gauge needle several 
times to reduce viscosity and boiled for 10 min before electrophoresis 
at 45 mA on a 10% polyacrylamide gel containing SDS. Resolved 
proteins were transferred to a nitrocellulose membrane (Schleicher 
& Schuell). and the abundance of p53 protein was estimated by 
immunoblotting .with a human p53-specific monoclonal antibody 
PAbl80l (Banks et aL 1986). Bound antibody was detected using the 
enhanced chemiluminescence detection system (DuPont NEN Research 
Products) according to the manufacturer's instructions. 

Northern blot analysis 

Total cellular RNA was isolated using the guanidinium thiocyanate- 
cesium chloride method (Chtrgwin et aL 1979). 20 *ig of total RNA 
was separated by electrophoresis on a 1% agarose gel containing 6* 
formaldehyde and transferred to a nitrocellulose membrane (Schleicher 
& Schuell). The blots were hybridized with cDNA probes labelled with 
f'-PldCTP in a random priming reaction (Fcinberg and Vogelstein. 
1983) washed and exposed to X-ray film. The amount of RNA 
was determined with a Molecular Dynamics Phosphorlmager using 
Multiquani software. The human P 53 probe was the Xbal-EcM fragment 
of P 53 cDNA from the pR4-2 plasmid (Harlow et aL 1985): the L35 
probe was the Pst\-BamH\ fragment from the human nbosomal protein 
L35 cDNA (Herzog et aL 1990): the GAPDH probe was a IJ kb Psti 
fragment of rat GAPDH cDNA (Fort et aL 1985): the 18S nbosomal 



RNA probe was the^oM fragment froriFthe human ribosornai R.\a 
gene (Torczynski e^H985). 

Genomic DNA preparation 

Genomic DNA from OCl/AML-3 and OCl/AML-4 cell lines was isolated 
following a modification of the procedure described by Kupiec rt «/ 
(1987). 3X I0 7 cells were washed with ice-cold PBS buffer, resuspeiuJed 
in 3 ml of lysis buffer (20 mM EDTA pH 8.0. 100 ug/ml proteinase K. 

0. 5fc sarkosyl) and incubated at 50°C for 3 h. DNA was extracted 
with pterwl/chloroform, dialysed against 50 mM Tris-HCl pH 8 a 
10 mM EDTA. 10 mM NaCl at 4*C. and then treated with RNase A 
(100 ug/ml) at 37*C for 3 h. DNA was again extracted with phenol/ 
chloroform and dialysed against 10 mM Tris pH 7.4, 1 mM EDTA. 
DNA concentration was determined by measuring the absorbancc at 
260 nm. 

Amplification of p53 sequences from RNA *nd DNA 
20 ng of total RNA was precipitated with ethanol and rcsuspended in a 
30 \x\ reaction containing 300 ng of oligofdT) primer (Arnersham 
International). 50 mM Tris-HCl pH 8.3. 77 mM KCl 3 mM MgCk 
3 mM dithiothreitol. 3 mM dNTP. 30 units of RNAguard (Pharmacia) 
and 200 units of Moloney murine leukemia virus reverse transcriptase 
(GfBCO-BRL) and incubated at 42*C for 60 min. The first strand cDNA 
was then used as the template for amplification by PCR using Taq 
polymerase (Promega). PCR amplification was performed with 10 ul of 
each first stand cDNA as the template and 40 cycles of oenaturation 
<94*C. 1 min). annealing (64*C 30 s). and elongation (72 # C I mini. 
The following p53-specific primers were used for amplifying the c omplete 
codins region and the 3'UTR: 5'SX1 (sense, exon 1. GACACTTT- 
GCGTTCGGGCTGGGAG). 5'SX5A (sense, exon 5. GAGCGCTGCT- 
CAGATAGCGATG). 3'SXll (sense, exon II. GAAGGGCCTCACT- 
CAGACTGAC). 3'AX-6 (antisense. exon 6. AGATGCTGAGGAGGG- 
GCCAGAC). JS-3 (antisense. exon 11. GAGGGAGAGATGGGGGT- 
GGGAGGCTGTC) and AS-4 (antisense. exon 11. GCCAGCAAAGT- 
TTTATTGTAAAATAAG). The 5'UTR and sequences further upstream 
were amplified from I pg genomic DNA using the following pair of 
p53-specific primers: 5'UTR-l (s ense, p romoter region, ACCTAA* 
GCTTGTC ATGGCG ACTGTCCAGCTTTG ) and p-EX (antisense. exon 

1 . CC A ATCCAGGG AAGCGTGTC ACCG ). 

i 

Diced sequencing of double-stranded PCR products 
Double-stranded DNA fragments produced by PCR amplification were 
eluted from agarose gels and purified by extraction with phenol/ 
chloroform. 200 ng of purified PCR product were mixed with human 
p53-specific oligonucleotides as sequencing primers, frozen in dry ice. 
dried in a centrifueal evaporator (Savant SpeedVac). ^dissolved in 
sequencing buffer (40 mM Tris-HCl pH 7.5. 25 mM MgCi>, 50 mM 
NaCl. 10* DMSO) and subjected to the sequencing reaction as described 
by Winship(1989). 

RNase protection assay 

Plasmid p729 was constructed from three DNA fragments in two stages, 
A 330 bp DNA fragment derived from the human p53 gene promoter 
was excised from the P 2E-H2BX plasmid (Lamb and Crawford, 1986) 
with Hindlli and Xbal and inserted into the pGEM-4 plasmid (Promega) 
between the /Ymdlll and Xbal sites. In the second stage, a fragment 
corresponding to the 5' end of p53 mRNA was obtained by KT-PCR 
usins p53 mRNA prepared from OO/AML-3 cells and the P 53-speafic 
primers 5'UTR-3 (sense, exon 1, CCGGAAGCTTCAAAA^^ 
GAGCC ACCGTCCAG ) and 5 'AX 4 (antisense, exon 4, GGTGTAGG- 
AGCTGCTGCTGGTGC). The resulting fragment was end-filled with 
the Klenow fragment of DNA polymerase 1. digested with Xbal at the 
site present in the 5'UTR- 3 primer shown underlined and inserted 
between the Xbai and $ma\ sites present in the plasmid generated in the 
first stage. . 

p729 was linearized with HindUl and a 729 nucleotide antisense probe 
was prepared by transcription with SP6 RNA polymerase. The in \vttro 
transcription reaction mixture contained 50 mM Tris-HCl pH 8.0, 10 mM 
M°CU 4 mM spermidine. 10 mM NaCl. 0.5 mM each of ATP. GI F. 
CTP, "l2 uM UTP. 5 hCi I^PIUTP. 10 mM dithiothreitol 20 uni * ° r 
RNAguard. 0.5 ng of linearized template and 10 units of SP6 # ^ A 
polymerase in a final volume of 20 pi. After incubation at 37*C for 
60 min. the DNA template was digested with DNase I and the RNA 
probe was extracted with phenol/chloroform, precipitated with ethanol 
and rcsuspended in water. This RNA probe covered the entire p53 gene 
promoter region and included the first three exons and a part of uh. 
fourth exon. p53 transcripts initiating from one of the major start sites 
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should yield protected fragments of -385 nWxides. p53 iransenpts 
originating from the most 5' of the minor start sites should yield 
protected fragments of 449 nucleotides (Tuck and Crawford. 1989). 

[n the RNase protection assay. 30 uc of total RNA was mixed with 
1XI0 5 c.p.m. of the labelled probe and precipitated with ethanol. The 
RNA/probe mixtures were then washed, dried and resuspended in 10 nl 
of hybridization solution (Winter et aL. 1985). heated to 80'C for 10 
min 'and hybridized at 46*C overnight. After hybridization, the samples 
were mixed with 0.18 ml of RNase digestion mix containing 60 ug/ml 
of RNase A (type III. Siama). 1100 U/ml of RNase Tl (Boehnnger 
Mannheim) in 300 mM NaCI. 5 mM EDTA. 10 mM Tris-HCl pH 7.5. 
After incubation at 37°C for 60 min. the digestion was terminated by 
addition of 10 pi of 20* SDS and 5 ul of proteinase K (10 mg/ml> 
(Boehnnger Mannheim) and incubation at 37°C for 15 min. Protected 
fragments were extracted with phenol/chloroform, precipitated with 
ethanol. resolved by denaturing gel electrophoresis and visualized by 
autoradiography. 

Polysome analysis 

5X I0 7 cells were washed once in ke-cold Tris-saline solution i25 mM 
Tris-HCl pH 7.5. 25 mM NaCI) containing 10 mM MgO : and 10 fxg/ml 
cycloheximide. The cells were then immediately lysed on ice with the 
use of a Dounce homoeenizer in 2 ml homogenization buffer containing 
25 mM Tris-HCl pH^.5. 25 mM NaCI. 10 mM MgO> 2* Triton 
X-100. 340 U/ml heparin (LEO Laboratories Canada Lid). 2 mM 
vanadyl ribonucleoside complex (Sigma). 25 mM PMSF. 10 ug/ml 
cycloheximide. I mM dithiothreitol and I mM ECTA. The extract was 
centrifuocd at 14 000 r.p.m. for 6 min at 4'C to remove cell debris, the 
supernatant was collected and layered over a 15-50* linear sucrose 
gradient ( 11 ml) prepared in homogenization buffer. The gradients were 
centrifused in an SW4I Beckman rotor at 175 000 S for 110 mm at 
4°C. Ten fractions of equal volume were collected from the bottom of 
the tubes. RNA was prepared from each of the fractions by phenol/ 
chloroform extraction and ethanol precipitation and resuspended in 
XX) |il DEPC-treated water. The amount of p53 mRNA in each fraction 
MOO ul of the RNA sample) was determined by dot-Wot hybridization 
analysis usins a u P-labelled human p53 cDNA probe. Polysomes used 
to calibrate the gradients were prepared in exactly the same way except 
for an additional purification step involving precipitation of the polysomes 
present in the homoeenate with 100 mM MgCU for I h on ice before 
sucrose eradient sedimentation. For calibration. 0.3-ml fractions were 
collected" from the bottom of the gradient and A 254 of each fraction was 
determined. 

Templates for in vitro transcription and translation 
Plasmid P 2516 contains nearly full-length human wild-type p53 cDNA 
and was constructed by the correct ligation of three cDNA fragments. 
One fragment corresponding to the 5' end of the p53 ™^ ™J 
obtained from P R4-2 (Harlow et at. 1985) after digestion with Xbal 
and Pn<H which cut in exons I and 5. respectively. The middle fragment 
was obtained from P ProSp53 (Madashewski et cL 1987) after digestion 
with /Vi/Il and fiamHI which cut in exons 5 and U. respectively. The 
third fragment corresponding to the 3' end ofU>e p53 transcript was 
obtained by RT-PCR amplification of the 3'UTR of P 53 mRNA using 
p53-specific oligonucleotides as primers, 3'SX13 (sense exon I iL 
GTCACCCCATCCCACACCCTCC) and AS-4. The PCR-amplified 
fragment was end-filled.with the Klenow fragment of DNA polymerase 
| and digested at an internal BamH\ site. These three fragments which 
represent contiguous sequences of the native p53 transcript 
between the Xbal and Sma\ sites of a modified form of <^ PSP^aor 
(Promesa) in which polylinker sequences between the HmdM sue and 
the XM site were deleted. The resulting plasmid is referred to as 
P25I6 and yields a p53 transcript in mro starting with the sequence 

VCAATACAAGCTCIAGA 3'. The in vitm transcript is nearly 

identical to p53 transcripts originating from the ^ * *L£^ 
transcription initiation sites in vivo which start with 5 C/^AACTCT^ 
GA V (Tuck and Crawford. 1989). The beginning of rfenmy xofres- 
p^ndin* to an Xbal site in the cDNA is underlined. DigestK>n of p-5 6 
with EcoM provides a template that can produce a synthetic full-^gih 
o53 transcript of 2516 nucleotides. Digestion with BamHl provides a 
template for a tmncated P 53 transcript of 2034 nucleotides that is miss.ng 
sequences from the 3'UTR containing the Alu -I ike element. 

The plasmid pCAT-Alu was constructed in two steps. First, the 
chloramphenicol acetyliransferase gene !?™J^J]*Z 
plasmid (Fu et aL 1991) with Wiidll! and BamWX. and inserted into 
pSP64 to generate P SP6CAT. Second, the *ainHl-£r*Rl fragment from 
P 25I6 that contains the Atu-like element present m the p53 3 UTR was 
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inserted immediately down^^m of the CAT gene. The plasmid pCAT- 
BS was constructed by removing the SmaX-BamHX fragment of the p53 
3'UTR present in p2516 and inserting this fragment in reverse orientation 
into pSP6CAT immediately downstream of CAT. This Sma\~Bcm\\\ 
fragment is missing the Alu4ike element present at the distal end of the 
p53 3'UTR. 

In vitro transcription and in vitro polyadenyiation 
Plasmid DNAs containing templates for in vitro transcription were 
linearized at selected restriction endonuclease sites. Standard transcription 
assays (Melton et aL. 1984) were performed as described above for the 
preparation of antisense RNA probes with the omission of ("PJUTP. 
05 mM 7m G<5')ppp<5')G and 0.05 mM GTP were included in the 
reactions to provide efficient capping at the 5' end of synthetic transcripts, 
Polyadenyiation reactions contained synthetic RNA, 0.2 mM ATP. 50 mM 
Tris-HCl pH 8.0. 10 mM MgCI 2 . 250 mM NaO. 2 mM MnCl 2 . 2 mM 
dithiothreitol. I unit/ul RNAguard (PharmacU). 500 ug/ml of BSA 
(Pharmacia) and 5 units of poly(A) polymerase (Pharmacia) in a 50 ul 
final volume (McCrew et aL. 1 989). After 30 min at 37*C. polyadenylated 
RNAs were purified by phenol/chloroform extraction and ethanol pre- 
cipitation. 

In vitro translation and immunoprecipitation 
Synthetic transcripts were translated in micrococcal-nuclease-treated 
rabbit reticulocyte lysates (Promega) under the conditions recommended 
by the supplier. Reactions containing p53 transcripts were incubated for 
30 min at 30*C in the presence of [ 35 SImethionine and stopped by 
addition of dithiothreitol to a final concentration of 1 mM and EDTA 
pH 8.0 to a final concentration of 10 mM. Each reaction was then 
divided into two aliquots, one for immunoprecipitation with the p53- 
specific monoclonal antibody PAb42l and the other for irrununopreciprt- 
ation with a control antibody PAb4l9. Reactions containing CAT or 
luciferase transcripts were incubated for 30 min at 30*C in the presence 
of | ?5 S jmethionine and were stopped by addition of protein sample buffer, 
boiled for 5 min and resolved by polyacrylamide gel electrophoresis. 
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are completed and then proceed on through mitosis, pro- 
ducing normal daughter cells. In contrast, mutants defective 
in this cell-cycle checkpoint proceed through anaphase be- 
fore assembly of the spindle and attachment of kinetochores 
is complete; consequently, they mis-segregate their chromo- 
somes, producing abnormal daughter cells that. die. Analysis 
of these mutants, called bub (budding uninhibited by ieno- 
myl) and ma d -{mitotic arrest deficient), should shed light on 
the mechanism by which the mitotic checkpoint operates. 
The sequence of one of the BUB genes indicates that it en- 
codes a protein kinase, which may influence the activities of 
multiple proteins. 

d and G 2 Arrest in Cells with Damaged 
DNA Depends on a Tumor Suppressor 
and Cyclin-Kinase Inhibitor 

Cells whose DNA is damaged by irradiation with UV light 
or y-rays or by chemical modification become arrested in 
d and G 2 until the damage is repaired. Arrest in Gj pre- 
vents copying of damaged bases, which would fix mutations 
in the genome. Replication of damaged DNA also causes 
chromosomal rearrangements at high frequency by as-yet 
unknown mechanisms. Arrest in G 2 allows DNA double- 
stranded breaks to be repaired before mitosis by mechanisms 
discussed in Section 12.4. If a double-stranded break is not 
repaired, the broken distal portion of the damaged chromo- 



some is not properly segregated because it is not physically 
linked to a centromere, which is pulled toward a spindle 
pole during anaphase. 

SjljB As discussed in Chapter 24, genes whose inactiva- 
jEfjfl tion contributes to the development of a tumor are 
MH'lH l fH called tumor-suppressor genes. The most commonly 
mutated tumor-suppressor gene associated with human can- 
cers is p53, so named because it encodes a phosphorylated 
protein with an apparent molecular weight of 53 kDa as 
estimated from SDS-polyacrylamide gel electrophoresis. The 
p53 protein functions in the checkpoint control that arrests 
human cells with damaged DNA in G u and it contributes 
to arrest in G 2 . Cells with functional p53 arrest in Gi or G 2 
when exposed to ^irradiation, whereas cells lacking func- 
tional p53 do not arrest in Gi (Figure 13-35). 

Although the p53 protein is a transcription factor, un- 
der normal conditions, it is extremely unstable; thus it gen- 
erally does not accumulate to high enough levels to bind to 
p53-controI elements and activate transcription. Damaged 
DNA somehow stabilizes p53, leading to an increase in its 
concentration. One of the genes whose transcription is stim- 
ulated by p53 encodes p21 cu> , a cyclin-kinase inhibitor 
(CKI) that binds and inhibits all mammalian Cdk-cyclin 
complexes. As a result, cells are arrested in G x (and G 2 ) un- 
til the DNA damage is repaired and p53 and subsequently 
p21 ap levels fall (Figure 13-36). 
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* FIGURE 13-35 Effect of mutation of the p53 tumor- 
suppressor gene on G, DNA-damage checkpoint control. 

distribution of cultured human cells in G 1f S, and G 2 was 
determined by analysis of their DNA content with a fluorescence- 
activated cell sorter as described in Figure 13-24. (a) By 8 hours 
following exposure of wild-type cells to y-radiation, cells that 
w ere in the S phase (red shading} had completed DNA synthesis, 
en tered G 2 , and then arrested, accounting for the rise in the G 2 
P^k. The absence of S-phase cells indicates that irradiation 



(b) p53~ mutant cells 
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prevented new cells from entering the S phase, causing them 
to arrest in Gi. (b) The presence of an S peak 8 hours after 
irradiation of p53T mutant cells indicates that the Gi checkpoint 
does not operate in these cells. The increase in the G 2 peak 
indicates that the checkpoint blocking entry of irradiated cells 
into mitosis still operates in these mutant cells. [See S. J. Kuerbitz 
et al.. 1992, Proc. Natl Acad $ci. USA 89:7491; adapted from A. Murray 
and T. Hunt 1993, The Cell Cycle: An Introduction, W. H. Freeman and 
Company.) 
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The E7 oncoprotein of the high risk human papilloma- 
virus type 16 (HPV-16), which is etiologically associated 
with uterine cervical cancer, is a potent immortalizing 
and transforming agent. It probably exerts its oncogenic 
functions by interacting and altering the normal activity 
of cell cycle control proteins such as p21 WAF1 , p27 K,PI and 
pRb, transcriptional activators such as TBP and AP-1, 
and metabolic regulators such as M2-pyruvate kinase 
(M2-PK). Here we show that E7 is a short-lived protein 
and its degradation both in vitro and hi vivo is mediated 
by the ubiquitin-proteasome pathway. Interestingly, 
ubiquitin does not attach to any of the two internal 
Lysine residues of E7. Substitution of these residues with 
Arg does not affect the ability of the protein to be 
conjugated and degraded; in contrast, addition of a Myc 
tag to the N-terminal but not to the C-tenninal residue, 
stabilizes the protein. Also, deletion of the first 11 amino 
acid residues stabilizes the protein in cells. Taken 
together, these findings strongly suggest that, like MyoD 
and the Epstein Barr Virus (EBV) transforming Latent 
Membrane Protein 1 (LMP1), the first ubiquitin moiety 
is attached linearly to the free N-terminal residue of E7. 
Additional ubiquitin moieties are then attached to an 
internal Lys residue of the previously conjugated 
molecule. The involvement of E7 in many diverse and 
apparently unrelated processes requires tight regulation 
of its function and cellular level, which is controlled in 
this case by ubiquitin-mediated proteolysis. Oncogene 
(2000) 19, 5944-5950. 

Keywords: human papilloma-virus (HPV); E7; ubiqui- 
tin; proteolysis; N-terminus 



Introduction 

The E7 oncoprotein of the high risk human papillo- 
mavirus type 16, which is etiologically associated with 
pathogenesis of human uterine cervical cancer, is a 
potent immortalizing and transforming protein. Ex- 
pression of E7 can transform rodent fibroblasts (Kanda 
et al, 1988), and in conjunction with an activated Ras 
oncogene, primary rodent cells (Phelps et al, 1988). 
Continued expression of the E7 gene is required for the 
maintenance of the transformed phenotype (Crook et 



♦Correspondence: A Ciechanover, Department of Biochemistry, 
Faculty of Medicine, Technion-lsrael Institute of Technology, PO 
Box 9649, Efron Street, Bat Galim. Haifa 30196, Israel 
Received 17 July 2000; revised 21 August 2000; accepted 3 October 
2000 



al, 1989), and expression of the protein in non- 
metastatic mouse cell lines renders the cells metastatic 
in nude mice (Chen et al, 1993). In transgenic mice, 
co-expression of E7 along with E6, another high risk 
HPV oncoprotein, elicits epidermal hyperplasia (Aue- 
warakul et ah, 1994), verrucose lesions and papillomas 
(Greenhalgh et al, 1994). Furthermore, E6 and E7 can 
cooperate to induce various tumors when expressed 
ectopically in transgenic mice (Arbeit et al, 1993; Pan 
and Griep, 1994). Finally, both E7 and E6 arc 
necessary and sufficient to immortalize their priman 
host cells, human squamous epithelial cells (Hawle\- 
Nelson et at., 1989; Munger et al, 1989). 

While the molecular mechanisms that underlie the 
transforming and immortalizing activity of E7 are still 
obscure, the protein appears to exert most of its 
oncogenic functions by interacting physically with ke> 
cellular regulatory proteins which leads to modulation 
of their normal activity. One main function of E7 is its 
ability to deregulate control df cell cycle progression, 
allowing cells to exit GO and enter S phase. It has been 
shown that via its cd2 domain, E7 binds to the cell 
cycle regulators pl07, pl30 and pRb (Arroyo et aL 
1993; Davies et al, 1993; Hu et al, 1995). Normally, 
these proteins function as transcriptional repressors 
that lead to Gl arrest. It was suggested that the 
binding of E7 to these proteins leads to their 
dissociation from their complex with E2F which 
correlates with stimulation of E2F-dependent transcrip- 
tion. It has also been shown that E7 interacts with 
both p27 K,PI (Zerfass-Thome et al, 1996) and p2I WAH 
(Funk et al, 1997). Consequently, both proteins fail to 
block the activity of Cyclin E/Cdk2 complexes which 
allow transition of the cell across the Gl/S border. 
Binding of E7 to the Jun component of AP-1 can lead 
to activation of AP-1 driven genes (Antinore et al* 
1996). It has been also shown that E7 binds to M2- 
pyruvate kinase (M2-PK), lowers its affinity to 
phosphoenol-pyruvate, and thus slows influx of 
substrates into the tricarboxylic, citric acid cycle 
(Zweschke et al, 1999). This leads to accumulation 
of upstream phosphometabolites which serve as 
precursors to amino acids and nucleotides. The pool 
of these precursors is low in resting cells, but its 
expansion is necessary during rapid cell division. E7 
can act however in a different mechanism; similar to 
targeting of p53 for degradation by HPV E6 (Scheffner 
et al, 1990), it has been shown that association of E' 
with pRb also targets the repressor for ubiquitin- 
mediated degradation (Boyer et al, 1996). Targeting ol 
pRb, and potentially of other regulatory proteins, for 
degradation, may serve as a second mechanism, besides 
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physical interaction, by which E7 exerts its dereg- 
ulatory effects. In the case of pRb, removal of the 
protein induces the activity of the E2F family of 
cellular transcription factors which are known to 
control the expression of the major cell cycle regulatory 
genes at the Gl/S transition. 

It has been reported that E7 is short-lived (Selvey et 
aL, 1994), however the system involved in its 
degradation and the mechanism(s) that underlie the 
process have remained obscure. Many studies have 
implicated the ubiquitin pathway in the degradation of 
various short-lived key regulatory proteins. It is 
involved in proteolysis and processing of many cellular 
proteins, including cell cycle regulators, oncoproteins 
and tumor suppressors, transcriptional activators, ER 
membrane proteins and cell surface receptors. In most 
cases, ubiquitination of the target protein signals its 
degradation by the 26S proteasome. Degradation of a 
protein via the ubiquitin-proteasome pathway involves 
two discrete and successive steps: (i) covalent attach- 
ment of multiple ubiquitin molecules to the protein 
substrate; and (ii) degradation of the tagged protein by 
the 26S proteasome. Conjugation of ubiquitin involves 
activation and transfer of ubiquitin from the ubiquitin- 
activating enzyme, El, to one of several ubiquitin- 
carrier proteins, E2s (known also as ubiquitin-con- 
jugating enzymes, UBCs). E2 transfers the activated 
ubiquitin moiety to the target substrate that is 
specifically bound to a member of the ubiquitin-protein 
ligase family, E3. Subsequent processive transfer of 
additional activated ubiquitin molecules and their 
conjugation to previously attached moieties, generates 
a polyubiquitin chain that serves as a degradation 
signal for the proteasome. Binding of the substrate to 
E3 plays an essential role in specific substrate 
recognition (for recent reviews on the ubiquitin- 
proteasome pathway, see, for example, Kornitzer and 
Ciechanover, 2000; Voges et aL 1999). In most cases, 
the first ubiquitin molecule is transferred to an e-NH 2 
group of an internal Lysine residue of the target 
protein. However, targeting of the myogenic transcrip- 
tion factor MyoD (Breitschopf et a/., 1998) and of the 
Epstein Barr Virus (EBV) Latent Membrane Protein- 1 
(LMP-1; Aviel et aL, 2000) involves initial ubiquitina- 
tion of the N-terminal residue followed by synthesis of 
a poly-ubiquitin chain attached to an internal Lys 
residue of this N-terminally attached ubiquitin moiety. 
Thus, unlike many known substrates of the ubiquitin 
system, degradation of these proteins does not require 
any internal Lys residue. It has also been reported that 
a mutant lysine-less a chain of the T cell receptor 
(TCR) is also degraded by the proteasome, in a process 
that depends on an intact ubiquitin system. However, a 
role for direct ubiquitination of the substrate, as well 
as identification of potential ubiquitination sites, has 
not been discerned (Yu and Kopito, 1999). Similarly, 
ubiquitin-mediated endocytosis and degradation of the 
growth hormone receptor also proceeds in the absence 
of any lysine residue (Govers et aL, 1999). The inability 
to identify ubiquitin adducts of the two receptors lead 
to the hypothesis that ubiquitination of another, yet to 
be identified factor, plays a role in the endocytic 
process. 

Discovery of additional substrates is essential in 
order to establish N-terminal ubiquitination as a novel 
targeting pathway, to analyze the structural motifs 



involved, to identify the conjugating enzymes, and in 
particular the ubiquitin ligase, E3, and to study the 
physiological significance of this new pathway. 

Here we show that HPV- 16 E7 is a novel substrate 
of the ubiquitin pathway that is targeted for degrada- 
tion via N-terminal ubiquitination. 



Results 

Degradation of E7 in a cell free reconstituted system 
requires A TP, formation of a polyubiquitin chain and 
the ubiquitin-carrier protein E2-FI 

To study the mechanisms that underlie the degradation 
of E7, we reconstituted a cell free proteolytic system. 
As can be seen in Figure la, degradation of the protein 
requires three components, ATP, ubiquitin, and the 
ubiquitin carrier protein (E2) E2-F1 (E2-FI is the 
rabbit homolog of the human UbcH7). Omission of 
any one of these components from the reaction 
mixture, abolished degradation. To further study the 
mechanism of ubiquitin action, we investigated whether 
formation of a polyubiquitin chain is required to 
promote degradation. To that end, we used the 
methylated derivative of ubiquitin that can modify 
the target protein only once and serves as a chain 
terminator (Hershko and Heller, 1985). As can be seen 
in Figure lb, MeUb strongly inhibited degradation of 
E7 in the cell free system. This inhibition can be 
alleviated by the addition of excess free WT ubiquitin. 
To demonstrate directly generation of a substrate 
anchored polyubiquitin chain, we incubated labeled 
E7 in crude HeLa cell extract in the absence or 
presence of ATP/S. This nucleotide can support the 
activity of the ubiquitin activating enzyme El (in which 
the -a-/? bond is utilized), but not the activity of the 26S 
proteasome that requires cleavage of the p-y bond 
(Johnston and Cohen, 1991). As can be seen in the 
experiment depicted in Figure lc, incubation of labeled 
E7 in the presence of ATPyS generates a polyubiquitin 
chain that is anchored to the substrate. 

Degradation of E7 in cells is mediated by the proteasome 

To study the mechanism(s) that underlie degradation of 
E7 in vivo, we followed the stability of the protein in ceils 
in the absence or presence of the specific proteasome 
inhibitor lactacystin. As can be seen in Figure 2. E7 is a 
short lived protein. Measurements in different experi- 
ments have demonstrated that the half life of the protein 
is -30-40 min (see also Figure 7). Here, after I h, more 
than 70% of the protein is degraded. Addition of 
lactacystin inhibited degradation completely. 

Degradation of a lysine-less E7 in a cell free reconstituted 
system requires ATP, formation of a polyubiquitin chain 
and the ubiquitin-carrier protein E2-FI 

We have previously shown that degradation of the 
transcriptional activator MyoD requires attachment of 
the first ubiquitin moiety to the N-terminal free amino 
acid residue and not to any internal Lys residue of the 
protein (Breitschopf et aL, 1998). Similarly, ubiquitin- 
mediated degradation of the Latent Membrane Protein 
1 (LMP1) of the Epstein-Barr Virus is not dependent 
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Figure 1 E7 is conjugated (o ubiquitin and degraded in vitro in 
an ATP-, E2-F1-, and ubiquitin-dependent manner, (a) Degrada- 
tion of E7 requires ATP, ubiquitin and E2-F1. Degradation of in 
vitro translated and 33 S-methionine labeled E7 was monitored in a 
cell free reconstituted system that contained reticulocyte Fraction 
II as described under Materials and methods. Ubiquitin, ATP, 
and E2-F1 were added as indicated. To avoid contamination of 
the labeled substrate with E2-FI, it was fractionated over DEAE 
prior to its addition to the reaction mixture as described under 
Materials and methods, (b) Degradation of E7 requires ubiquitin 
and formation of a polyubiquitin chain. Degradation of in vitro 
translated and labeled E7 was monitored in a cell free 
reconstituted system that contained HeLa cell Fraction II and 
ATP as described under Materials and methods. Ubiquitin, and 
MeUb were added as indicated, (c) Conjugation of ubiquitin to 
E7. In vitro translated and labeled E7 was incubated in complete 
HeLa cell extract in the absence or presence of ATP*/S as 
indicated and as described under Materials and methods 



on the single internal Lys residue of the molecule, and 
also requires initial fusion of ubiquitin to the N- 
terminal residue (Aviel et aL, 2000). To study whether 
a similar mechanism is also involved in the degradation 
of E7, we replaced the two Lys residues in positions 60 
and 97 with Arg. As can be seen in the experiment 
depicted in Figure 3, similar to the WT protein, 
degradation of the lysine-less E7 in a cell free 
reconstituted system also requires ubiquitin, E2-F1, 
and ATP (Figure 3a). Degradation requires formation 
of a polyubiquitin chain (Figure 3b,c). As noted for 
MyoD and LMP1, the amount of ubiquitin adducts 
formed is lower in the case of the lysine-less mutant 
(compare c in Figures 1 and 3), suggesting that in the 
WT protein, internal Lys residues can also play a role, 
though not an essential one, in the proteolytic process. 



Figure 2 Degradation of E7 in cells is sensitive to proteasome 
inhibition. Cos 7 cells were transiently transfected with cDNA 
coding for E7. Degradation of the protein was monitored in ;t 
pulse-chase labeling and immunoprecipitation experiment in the 
absence or presence of the cell permeable proteasome inhibitor 
r/asro-lactacystin ^-lactone as described under Materials and 
methods 



Degradation of lysine-less E7 in cells depends on an active 
proteasome 

Similar to the WT protein, degradation of the lysine- 
less mutant in cells is also mediated by the proteasome 
(Figure 4). 

Blocking of the N- but not of the C-terminus ofE7 inhibits 
both conjugation and degradation of E7 both in vitro ami 
in vivo 

To study the involvement of the N-terminal domain in 
targeting the protein for N-terminal residue ubiquitina- 
tion, we fused a 6 x Myc tag to the N-terminal and the C- 
terminal residues of E7. We predicted . that if the N- 
terminal domain is involved in specific recognition of the 
protein, and ubiquitin will be fused only to a certain 
amino acid sequence, moving of this sequence down- 
stream from the N-terminal domain will inhibit both 
conjugation and degradation. Indeed, as can be seen in 
Figure 5, conjugation of an N-terminally Myc-tagged 
WT E7 (that contains the two internal Lys residues) is 
strongly inhibited compared to a similar protein tjiat 
contains a Myc tag fused to its C-terminal residue 
(Figure 5a). Similarly, while the degradation of a C- 
terminally Myc-tagged E7 in a cell free system proceeds 
efficiently, the degradation of an N-terminal Myc-tagged 
WT protein is strongly inhibited (Figure 5b). Not 
surprisingly, the degradation of a similar Lysine-less 
mutant is also inhibited (Figure 5b). We noted that the 
C-terminally tagged mutant migrates slower than its N- 
terminally tagged counterparts; the reason for this 
peculiar behavior is not known. Similarly, in cells, N- 
terminally tagged WT and lysine-less E7s are stable, 
unlike the C-terminally tagged WT protein (Figure 6). 

The N-terminal domain of E7 contains the 
ubiquitination signal 

To study directly the role of the N-terminal domain of 
E7 as a ubiquitination signal, we deleted the first 1 1 or 
7 amino acids of the protein. As can be seen in Figure 
7a, deletion of the first 11 residues stabilizes the 
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Figure 3 Lysine-less E7 is conjugated to ubiquitin and degraded 
in vitro in an ATP% E2-F1-, and ubiquitin-dependent manner, (a) 
Degradation of lysine-less E7 requires ATP, ubiquitin and E2-FI. 
Degradation of in vitro translated and JS S-methionine labeled 
lysine-less E7 was monitored in a cell free reconstituted system 
that contained reticulocyte Fraction II as described under 
Materials and methods. Ubiquitin, ATP, and E2-F1 were added 
as indicated. To avoid contamination of the labeled substrate with 
E2-F1. it was fractionated over DEAE prior to its addition to the 
reaction mixture as described under Materials and methods, (b) 
Degradation of lysine-less E7 requires ubiquitin and formation of 
a polyubiquitin chain. Degradation of in vitro translated and 
labeled lysine-less E7 was monitored in a cell free reconstituted 
system that contained HeLa cell Fraction II and ATP as described 
under Materials and methods. Ubiquitin, and MeUb were added 
as indicated, (c) Conjugation of ubiquitin to lysine-less E7. /// 
vitro translated and labeled lysine-less E7 was incubated in 
complete HeLa cell extract in the absence or presence of ATP/S 
as indicated and as described under Materials and methods 
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Hgure 4 Degradation of lysine-less E7 in cells is sensitive to 
proteasome inhibition. Cos 7 cells were transiently transfected 
with cDNA coding for lysine-less E7. Degradation of the protein 
was monitored in a pulse-chase labeling and immunoprecipitation 
experiment in the absence or presence of the cell permeable 
proteasome inhibitor r/as/o-lactacystin ^-lactone as described 
under Materials and methods 
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Figure 5 Conjugation (a) and Degradation (b) of N- and C- 
terminally Myc-tagged WT and lysine-less E7 in vitro, (a) ATPyS- 
dependent conjugation of N- and C-terminally Myc-tagged WT 
E7. N-terminally (N-MT) and C-terminally (C-MT) Myc-tagged 
in vitro translated and 35 S-methionine-labe!ed WT-E7 were 
subjected to ATPyS-driven conjugation in a complete HeLa cell 
extract that contains also ubiquitin. Reactions were carried out 
and conjugates resolved via SDS-PAGE as described under 
Materials and methods, (b) Degradation of N- and C-terminally 
Myc-tagged WT. and N-terminally Myc-tagged lysine-less E7 in a 
cell free system. WT [WT(N)] and lysine-less [LL (N)] N- 
terminally Myc-tagged and WT [WT (C)] C-terminally Myc- 
tagged E7 were subjected to degradation in a cell free system 
containing complete HeLa cell extract, ubiquitin. and ATP. 
Reactions were carried out and proteins resolved via SDS-PAGE 
as described under Materials and methods 



protein. In contrast, degradation of a mutant protein 
from which the first 7 amino acid residues were deleted, 
proceeded similarly to that of the WT protein (Figure 
7b). The finding that the signal is constituted of a 
relatively long segment raises the hypothesis that it 
serves not only as an anchor for specific ubiquitination. 
but also as a binding site for the ubiquitin ligase E3. 



Discussion 

Ubiquitin-mediated degradation of multiple key reg- 
ulatory proteins is involved in the regulation of many 
basic cellular processes. Here we show that the HPV E7 
oncoprotein is a substrate for the ubiquitin system. It is 
degraded in an ATP-dependent manner in a process 
that requires also ubiquitin and the ubiquitin-carrier 
protein E2-Fl/UbcH7 (Figure la). It is not clear 
whether this E2 is the only carrier protein involved, 
or whether other E2s, such as members of the UbcHS 
family that are involved in the degradation of the bulk 
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Figure 6 Degradation of N-terminally Myc-tagged WT and 
lysine-less and C-terminally Myc-tagged WT E7«in cells. Cos 7 
cells were transiently transfected with cDNAs coding for WT 
[WT-MT(N)] and lysine-less [LL-MT(N)] N-terminally Myc- 
tagged and C-terminally [WT-MT(C)] Myc-tagged E7. Stability 
of the proteins was monitored in a pulse-chase labeling Jnd 
immunoprecipitation experiment as described under Materials 
and methods 



of cellular proteins, are also required. The identity of 
the ubiquitin-protein ligase E3 that binds the substrate 
specifically is still obscure. Degradation of E7 requires 
formation of a polyubiquitin chain: the chain termi- 
nator methylated ubiquitin inhibits degradation of E7. 
and inhibition can be relieved by the addition of excess 
free WT ubiquitin (Figure lb). Indeed, we were able to 
demonstrate directly formation of polyubiquitinated E7 
following incubation of the protein in the presence of 
ATPyS (Figure 1c). This nucleotide promotes* forma- 
tion of conjugates, but inhibits their degradation. In 
cells, E7 is extremely unstable and has a t| 2 of - 30- 
40 min. Incubation of cells in the presence of 
lactacystin, a specific proteasome inhibitor, inhibited 
degradation almost completely (Figure 2). 

To further dissect the mechanisms that underlie the 
recognition and degradation of E7, it was important to 
study whether a specific Lys residue is essential for 
formation of the polyubiquitin chain, or whether the 
two Lys residues in positions 60 and 97 can equally 
serve as anchors to the polyubiquitin chain. We 
individually replaced each of the two Lys residues 
with Arg, however, we could not observe any effect on 
either ubiquitin-mediated conjugation and degradation 
in vitro, or on the stability of the protein in vivo (not 
shown). Therefore, we decided to replace these two 
residues. However, when incubated in a cell free 
reconstituted system, and similar to the behavior of 
the WT protein, degradation of the lysine-less E7 was 
still dependent on ATP, E2-F1, and ubiquitin (Figure 

3) . Also, in cells, degradation of the lysine-less protein 
was sensitive to -inhibition of the proteasome (Figure 

4) . As no lysine residues were available for ubiquitina- 
tion, we suspected that the first ubiquitin residue is 
attached to the free N-terminal NH 2 group, as is the 
case for MyoD (Breitschopf et al. % 1998) and EBV 
LMP1 (Aviel et a/., 2000). To study the possibility that 
E7 is also targeted via initial N-terminal ubiquitina- 
tion, and with the assumption that the N-terminal 
domain of the protein determines the specificity for 
this process, we altered the N terminal domain of both 
WT and the lysine-less E7 by fusing it with a Myc-tag. 
For both forms, tagging resulted in major stabilization 
of the protein in vitro (Figure 5b) and in vivo (Figure 
6). Concomitantly, we noted a marked decrease in 
conjugation of the tagged WT protein in a cell free 
system (Figure 5a). This decrease occurred despite the 
presence of the two internal Lys residues in the 
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Figure 7 Degradation of 1 1 (a) and 7 (b) amino acid residues N- 
terminally deleted E7s in cells, (a) Degradation of 1 1 amino acid 
residues N-terminally deleted WT E7. Cos 7 cells were transiently 
transfected with a cDNA that codes for E7 lacking the first 1 1 
amino acid residues (residues 2-12; AN11). Stability of the WT 
and the deletion mutant were monitored in a pulse-chase labeling 
and immunoprecipitation experiment as described under Materi- 
als and methods. Band intensities were quantified by analysis of 
the imaging data and plotted as relative percentage of the signal 
at time 0. 00 Degradation of 7 amino acid residues N-terminally 
deleted VvT E7. Cos 7 cells were transiently transfected with , a 
cDNA that codes for E7 lacking the first 7 amino acid residues 
(residues 2-8; AN7). Stability of the WT and the deletion mutant 
and quantitative analysis of the data were carried out as described 
under Materials and methods and above 



protein. The effects of Myc tagging on both conjuga- 
tion and degradation are even more striking if one 
takes into consideration the existence of additional six 
lysine residues in the tag. The finding that the Myc- 
tagged WT E7 can still be conjugated is probably due 
to the existence of the two internal lysine residue. 
Interestingly, while after 60 min the protein appears 
stable (Figure 6), it is still degraded in a much slower 
rate in a ubiquitin dependent mode (not shown). This 
finding demonstrates that the internal lysine residues 
(either of the protein and/or of the tag) can play a role 
in the process, though not a major one. In the native 
protein, they may serve as modulators of proteolysis. 
A similar observation was noted also for MyoD and 
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protein that is targeted for degradation following 
ubiquitination of the N-terminal residue by blocking 
the access of ubiquitin, the ligase, or both, to a specific 
ubiquitination site and/or recognition motif at the N- 
terminal domain. To test this notion directly, we 
deleted the fir^t 7 and 11 N-terminal- amino acid 
residues. As can be seen in Figure 7, deletion of the 
first 11 residues stabilized the protein significantly. In 
contrast, deletion of the first seven residues was not 
sufficient to confer stability. Our initial results indicate 
that deletion of amino acid residues 8-11 results in 
stabilization of the protein, suggesting that they may 
play an important role in the recognition and targeting 
of the protein (unpublished data). Obviously, it will be 
important to study whether this sequence can serve as 
a 'universal' transferable N-terminal destabilizing 
element. It should be noted, however, that we could 
not find sequence homology between the N-terminal 
regions of E7, MyoD, LMP1, and TCRa (it is not 
clear that this protein is degraded via N-terminal 
ubiquitination; Yu and Kopito 1999), but such 
functional motifs or epitopes may be generated 
following folding of the protein rather than at the 
primary sequence level. It is quite possible that the 
motifs are different and targeted by different ligases. 
Future identification of the E3 will be necessary to 
resolve the question of whether the N-terminal region 
serves also as a recognition site for the ligase. In this 
context it is worth mentioning the case of the Cdk 
inhibitor p21 Cipl , where the researchers suggested that, 
similar to ODC, the protein is targeted by the 
proteasome in a process that does not involve 
ubiquitination (Sheaff et al. t 2000); a lysine-less N- 
terminally Myc-tagged protein is still unstable and 
degraded in a proteasome-dependent manner. 

It should be emphasized that N-terminal ubiquitina- 
tion is different from the recognition via the N-end rule 
(Varshavsky, 1996) where the protein is recognized via 
the N-terminal residue, but conjugation occurs on 
internal lysines. 

It is clear that for N-terminal ubiquitination to 
occur, a protein must have a free and exposed N- 
terminal residue. Thus, proteins acetylated at the N- 
terminal residue cannot be targeted via this pathway. 
While 80% of cellular proteins are modified at the N- 
terminal residue by acetylation (Jornvall, 1975), 20% 
bear free and exposed N-termini. It is possible that 
these proteins are targeted via N-terminal ubiquitina- 
tion. Indeed, in agreement with the set of 'rules' that 
determines, according to the three N-terminal residues, 
whether a protein will be acetylated, E7 that has 
MHG as its first three residues, should not be 
modified. However, since the set of 'rules' was 
determined for an extremely limited number of 
proteins, mostly yeast proteins (Polevoda et a/., 
1999; Boissel et al y 1988; Huang et al, 1987), it is 
not clear whether it is possible to predict which 
proteins will be subjected to N-terminal ubiquitina- 
tion. The hypothesis however can be tested experi- 
mentally. Furthermore, manipulation of proteins by 
altering their N-terminal domain and rendering them 
susceptible or resistant to acetylation, can also 
corroborate or rule out the notion that N-terminal 
ubiquitination is the pathway of 'choice' for free and 
exposed N-termini proteins. 



Materials and methods 

Materials 

Materials for SDS-PAGE and Bradford reagent were from 
Bio-Rad. A mixture of L- 35 S-Iabeled methionine and cysteine 
for metabolic labeling, [ 35 S]methionine for in vitro translation, 
as well as pre-stained MW markers and immobilized Protein 
G, were obtained from Amersham Pharmacia Biotech. Tissue 
culture sera and media were from Biological Industries (Bet 
Hae'emek, Israel) or from Sigma, Antibody against E7 was 
from Santa Cruz. c/as/o-Iactacystin ^-lactone was from 
Calbiochem. Ubiquitin, dithiothreitol (DTT), adenosine-5'- 
triphosphate (ATP), phosphocreatine, creatine phosphoki- 
nase, 2-deoxyglucose and [Tris(hydroxymethyl)ami no- 
methane] (Tris buffer), were from Sigma. Hexokinase and 
Fugene™ 6 transfection reagent were from Roche Molecular 
Biochemicals. Wheat germ extract-based transcription-trans- 
lation coupled kit (TNT*) was from Promega. Restriction 
and modifying enzymes were from New England Biolabs. 
Oligonucleotides were synthesized by Biotechnology General, 
(Rehovot, Israel). All other reagents were of high analytical 
grade. 

Methods 

Ceil lines Cos-7 cells grown at 37°C in Dulbecco's Modified 
Eagle's Medium supplemented with 10% fetal calf serum 
(FCS). All transfections were carried out using the Fugene™ 
reagent, and cells were analysed after 36-48 h. 

Plasmids and construction of mutant eDNAs WT and lysine- 
less mutant E7 cDNAs were subcloned into the EcoK\- 
Xbal site of the pCS2 and pCS2 + MT vectors (Breitschopf 
et al, 1998; Aviel et a/., 2000). These vectors were used for 
both in vitro translation (under the control of SP6 RNA 
polymerase) and expression in mammalian cells. Point 
mutations in E7 were generated by site-directed mutagenesis 
using the QuickChange™ kit (Stratagene). Deletion of the 
first 7 (AN7) or 11 (ANU) N-terminal amino acid residues 
of E7 was carried out using PCR and specific primers. PCR 
products were digested with EcoKl and Xbal and ligated 
into the pCS2 vector. In-frame insertion of 6 x Myc tag in 
the N or C termini of E7 was carried out using the 
pCS + MT vector and the appropriate PCR primers. 
Sequences of all constructs were confirmed using' an 
automatic sequencing system (ABI 310). 

Preparation and fractionation of crude reticulocyte lysate Re- 
ticulocytes were induced in rabbits and lysates were prepared 
as described (Hershko et a/., 1983). The lysate was 
fractionated over DEAE cellulose onto unadsorbed material 
(Fraction 1) and high salt eluate (Fraction II) as described 
(Hershko et a/., 1983). E2-F1 was prepared from Fraction I 
as described (Blumenfeld et o/., 1994). HeLa cell extract was 
prepared by hypotonic lysis as described previously (Onan et 
at., 2000) and fractionated as described above. 

Conjugation and degradation of E7 in a cell free reconstituted 
system The E7 cDNAs were translated in the presence of 
[ 35 S]Methionine using wheat germ coupled transcription- 
translation extract (TNT*, Promega) and SP6 RNA 
polymerase. When indicated, the crude lysate that contains 
the labeled substrate was fractionated over DEAE resin to 
Fraction I and II as described above. The labeled substrate 
was contained in Fraction II. Conjugation and degradation 
assays in a cell-free systems were performed as described 
elsewhere (Breitschopf et a/., 1998; Aviel et a/., 2000). Briefly, 
reaction mixture contained in a final volume of 12.5 /il: 50 /*g 
whole HeLa cell lysate proteins, or 50 ug reticulocyte 
Fraction II and 1 fig E2-F1 as indicated, 5 fig ubiquitin, 



m 



and -25 000 CPM of in vitro translated labeled El. 
Reactions were performed in the pres|^k of 0.5 mM ATP 
and an ATP-regenerating system (10^|P phosphocreatine 
and 5 pg phosphocreatine kinase), or ATPyS (5 mM) as 
indicated. For depletion of ATP, 0.5 pg hexokinase and 
20 mM deoxyglucose were added. When indicated, the chain 
terminator methylated ubiquitin (MeUb; Hershko and Heller, 
1985) was added at 5.0 pg. In these reactions ubiquitin was 
present at 1.0 pg. To overcome the inhibition of MeUb, 
15 ;/g of ubiquitin were added. Conjugation assays contained 
in addition 0.5 fig of the isopeptidase inhibitor ubiquitin 
aldehyde (UbAl; Hershko and Rose 1987). Degradation 
reactions were carried out at 3VC for 2 h, whereas 
conjugation assays were incubated at 37 C C for 1 h. Reactions 
were terminated by the addition of sample buffer and 
resolved by SDS-PAGE (15%). E7 was visualized by 
Phosphorlmager (Fuji, Japan) 

Stability of proteins in vivo Cellular stability of E7 proteins 
was monitored in a pulse-chase labeling and immunopreci- 
pitation experiments as described (Breitschopf et al. 1998). 
The proteasome inhibitor cfar/o-lactacystin ^-lactone (10 pM) 
was added 20 min prior to the end of the labeling period 
(pulse) and was present throughout the experiment. Follow- 
ing labeling, cells were harvested (time 0: pulse) or were 
further incubated for the indicated periods of time (chase). 
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Cells were lysed, and the labeled proteins, were precipit a i Ct i 
using anti-E7 antibody. J^mune complexes were collecied 
using immobilized proteJ^R Following SDS-PAGE (15V 
proteins were visualized ^jz Phosphorlmager. 



Protein concentration Protein concentration was determines 
according to Bradford (1976) using BSA as a standard. 
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